Qwen3 235B A22B FP8 Throughput

Hybrid instruct + reasoning model (232Bx22B MoE) optimized for high-throughput, cost-efficient inference and distillation.

Try now

read docs

About model

Qwen3-235B-A22B-FP8 Throughput delivers groundbreaking advancements in reasoning, instruction-following, and multilingual support, with seamless switching between thinking and non-thinking modes. It excels in creative writing, role-playing, and complex agent-based tasks, supporting 100+ languages and dialects. Ideal for developers and researchers seeking optimal performance across various scenarios.

Quickstart guides

RAG

Building a RAG Workflow

Agents

Agent Workflows

Apps

Next.js Chat Quickstart

Performance benchmarks

Model	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
Qwen3 235B A22B FP8 Throughput	70.7%		65.9%
Related open-source models
Competitor closed-source models
Claude Opus 4.6	90.5%	34.2%			78.7%
OpenAI o3	83.3%	24.9%		99.2%	62.3%
OpenAI o1	76.8%			96.4%	48.9%
GPT-4o	49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

Qwen/Qwen3-235B-A22B-fp8-tput

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-235B-A22B-fp8-tput",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="Qwen/Qwen3-235B-A22B-fp8-tput",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'Qwen/Qwen3-235B-A22B-fp8-tput',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Related models

Model specifications

Model data

Model provider
Qwen
Type
Chat
Reasoning
Main use cases
Chat
Reasoning
Medium General Purpose
Function Calling
Features
Function Calling
Deployment
On-Demand Dedicated
Monthly Reserved
Serverless
Endpoint
Qwen/Qwen3-235B-A22B-fp8-tput
Parameters
235.1B
Context length
40k
Input price
$0.20 / 1M tokens
Output price
$0.60 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
April 28, 2025
Last updated
February 5, 2026
Quantization level
FP8
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model