Qwen3 235B A22B Instruct 2507 FP8 Throughput

235B MoE model with 22B activation featuring enhanced instruction following, reasoning, and 262K context for cost-efficient high-throughput inference.

Try now

read docs

About model

Enhanced Qwen3 model optimized for serverless inference with superior price-performance.

Quickstart guides

RAG

Building a RAG Workflow

Agents

Agent Workflows

Apps

Next.js Chat Quickstart

Performance benchmarks

Model	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
Qwen3 235B A22B Instruct 2507 FP8 Throughput	65.9%
Related open-source models
Competitor closed-source models
Claude Opus 4.6	90.5%	34.2%			78.7%
OpenAI o3	83.3%	24.9%		99.2%	62.3%
OpenAI o1	76.8%			96.4%	48.9%
GPT-4o	49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

Qwen/Qwen3-235B-A22B-Instruct-2507-tput

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-235B-A22B-Instruct-2507-tput",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="Qwen/Qwen3-235B-A22B-Instruct-2507-tput",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'Qwen/Qwen3-235B-A22B-Instruct-2507-tput',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Related models

Model specifications

Model data

Model provider
Qwen
Type
Chat
Reasoning
Main use cases
Chat
Small & Fast
Medium General Purpose
Function Calling
Features
Function Calling
JSON Mode
Deployment
On-Demand Dedicated
Monthly Reserved
Serverless
Endpoint
Qwen/Qwen3-235B-A22B-Instruct-2507-tput
Parameters
235B
Context length
262K
Input price
$0.20 / 1M tokens
Output price
$0.60 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
July 22, 2025
Quantization level
FP8
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model