DeepSeek V4 Pro

Million-token context intelligence with hybrid attention and three reasoning modes

Try now

read docs

About model

DeepSeek V4 Pro is DeepSeek's 1.6T parameter (49B activated) MoE model supporting 1M token context. It introduces a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention, requiring only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at million-token context. Pre-trained on 32T+ tokens with Muon optimizer and a two-stage post-training pipeline, V4 Pro delivers three configurable reasoning modes and strong performance across coding (93.5% LiveCodeBench), reasoning (90.1% GPQA Diamond), and agentic tasks (80.6% SWE-Bench Verified). MIT licensed.

Token Context

27% FLOPs and 10% KV cache vs V3.2 at 1M context

LiveCodeBench

93.50%

Codeforces rating 3206 for competitive coding

SWE-Bench Verified

80.60%

Agentic coding across repositories

Model key capabilities

Million-Token Efficiency: Hybrid CSA+HCA attention requiring only 27% of inference FLOPs and 10% of KV cache vs V3.2 at 1M context, with 83.5% MRCR 1M comprehension
Three Reasoning Modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum effort with 90.1% GPQA Diamond and 95.2% HMMT 2026
Coding & Agentic Leadership: 93.5% LiveCodeBench, Codeforces 3206, 80.6% SWE-Bench Verified, and 73.6% MCPAtlas for tool orchestration
Open-Source at Scale: 1.6T parameters (49B activated) pre-trained on 32T+ tokens, MIT licensed, with domain-expert post-training via SFT, RL, and on-policy distillation

Performance benchmarks

Model	GPQA Diamond	HLE	LiveCodeBench	MATH500	SWE-bench verified
DeepSeek V4 Pro			93.50%		80.60%
Related open-source models
Competitor closed-source models
Claude Opus 4.6	90.5%	34.2%			78.7%
OpenAI o3	83.3%	24.9%		99.2%	62.3%
OpenAI o1	76.8%			96.4%	48.9%
GPT-4o	49.2%	2.7%	32.3%	89.3%	31.0%

API usage

cURL
Python
Typescript

Endpoint:

deepseek-ai/DeepSeek-V4-Pro

curl -X POST "https://api.together.xyz/v1/chat/completions" \
  -H "Authorization: Bearer $TOGETHER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-ai/DeepSeek-V4-Pro",
    "messages": [
      {
        "role": "user",
        "content": "What are some fun things to do in New York?"
      }
    ]
}'

from together import Together

client = Together()

response = client.chat.completions.create(
  model="deepseek-ai/DeepSeek-V4-Pro",
  messages=[
    {
      "role": "user",
      "content": "What are some fun things to do in New York?"
    }
  ]
)
print(response.choices[0].message.content)

import Together from 'together-ai';
const together = new Together();

const completion = await together.chat.completions.create({
  model: 'deepseek-ai/DeepSeek-V4-Pro',
  messages: [
    {
      role: 'user',
      content: 'What are some fun things to do in New York?'
     }
  ],
});

console.log(completion.choices[0].message.content);

Model card
Architecture Overview:
• 1.6T total parameter MoE architecture with 49B parameters activated per token
• Hybrid attention combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for long-context efficiency
• At 1M token context: requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2
• Manifold-Constrained Hyper-Connections (mHC) strengthening residual connections for stable signal propagation
• 1M token context window
• Three reasoning modes: Non-think (fast responses), Think High (logical analysis), Think Max (maximum reasoning effort)
• FP4 + FP8 mixed precision (MoE expert parameters in FP4, other parameters in FP8)

Training Methodology:
• Pre-trained on more than 32T diverse and high-quality tokens
• Muon optimizer for faster convergence and greater training stability
• Two-stage post-training: independent cultivation of domain-specific experts through SFT and RL with GRPO, followed by unified model consolidation via on-policy distillation
• Integrates distinct proficiencies across diverse domains into a single model

Performance Characteristics:
• 93.5% LiveCodeBench, Codeforces rating 3206 for coding
• 90.1% GPQA Diamond, 95.2% HMMT 2026, 89.8% IMOAnswerBench for reasoning
• 80.6% SWE-Bench Verified, 76.2% SWE-Bench Multilingual, 55.4% SWE-Bench Pro for agentic coding
• 83.4% BrowseComp, 73.6% MCPAtlas Public, 51.8% Toolathlon for agentic tasks
• 83.5% MRCR 1M, 62.0% CorpusQA 1M for million-token comprehension
• 57.9% SimpleQA-Verified, 84.4% Chinese-SimpleQA for factual knowledge
• MIT licensed
‍
Prompting
Together AI API Access:
• Access DeepSeek V4 Pro via Together AI APIs using the endpoint deepseek-ai/DeepSeek-V4-Pro
• Authenticate using your Together AI API key in request headers
• Supports three reasoning modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum reasoning effort
• For Think Max mode, recommended context window of at least 384K tokens
• Recommended parameters: temperature=1.0, top_p=1.0
• Available on serverless and monthly reserved infrastructure
‍
Applications & use cases
Coding & Software Engineering:
• 93.5% LiveCodeBench and Codeforces 3206 for competitive and production code generation
• 80.6% SWE-Bench Verified for autonomous software engineering across repositories
• 76.2% SWE-Bench Multilingual for cross-language software engineering
• Terminal-Bench 2.0 and SWE-Bench Pro for complex systems engineering

Reasoning & Knowledge:
• Three configurable reasoning modes from fast intuitive responses to maximum effort reasoning
• 90.1% GPQA Diamond for scientific reasoning, 95.2% HMMT 2026 for mathematics
• 57.9% SimpleQA-Verified for factual knowledge
• 87.5% MMLU-Pro for general knowledge and understanding

Agentic Workflows & Long Context:
• 83.4% BrowseComp for agentic search, 73.6% MCPAtlas for tool orchestration
• 51.8% Toolathlon for multi-tool agentic tasks
• 1M token context with efficient hybrid attention for large codebases and documentation
• 83.5% MRCR 1M and 62.0% CorpusQA 1M for million-token comprehension
‍

Related models

Model specifications

Model data

Model provider
DeepSeek
Type
Reasoning
Chat
Code
LLM
Main use cases
Reasoning
Features
Function Calling
JSON Mode
Intelligence
High
Deployment
Serverless
Monthly Reserved
Endpoint
deepseek-ai/DeepSeek-V4-Pro
Parameters
1.6T
Activated parameters
49B
Context length
512K
Input price
$2.10 / 1M tokens
$0.20 (cached)/1M
Output price
$4.40 / 1M tokens
Input modalities
Text
Output modalities
Text

Released
April 24, 2026
Quantization level
FP4
External link
Provider docs
Category
Chat

Run in Playground

Quickstart docs

Deploy model

DeepSeek V4 Pro

About model

API usage

Model card

Prompting

Applications & use cases