DeepSeek V4 Pro
Million-token context intelligence with hybrid attention and three reasoning modes
About model
DeepSeek V4 Pro is DeepSeek's 1.6T parameter (49B activated) MoE model supporting 1M token context. It introduces a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention, requiring only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at million-token context. Pre-trained on 32T+ tokens with Muon optimizer and a two-stage post-training pipeline, V4 Pro delivers three configurable reasoning modes and strong performance across coding (93.5% LiveCodeBench), reasoning (90.1% GPQA Diamond), and agentic tasks (80.6% SWE-Bench Verified). MIT licensed.
1M
27% FLOPs and 10% KV cache vs V3.2 at 1M context
93.50%
Codeforces rating 3206 for competitive coding
80.60%
Agentic coding across repositories
- Million-Token Efficiency: Hybrid CSA+HCA attention requiring only 27% of inference FLOPs and 10% of KV cache vs V3.2 at 1M context, with 83.5% MRCR 1M comprehension
- Three Reasoning Modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum effort with 90.1% GPQA Diamond and 95.2% HMMT 2026
- Coding & Agentic Leadership: 93.5% LiveCodeBench, Codeforces 3206, 80.6% SWE-Bench Verified, and 73.6% MCPAtlas for tool orchestration
- Open-Source at Scale: 1.6T parameters (49B activated) pre-trained on 32T+ tokens, MIT licensed, with domain-expert post-training via SFT, RL, and on-policy distillation
Model | AIME 2025 | GPQA Diamond | HLE | LiveCodeBench | MATH500 | SWE-bench verified |
|---|---|---|---|---|---|---|
DeepSeek V4 Pro | 93.50% | 80.60% | Related open-source models | Competitor closed-source models | ||
90.5% | 34.2% | 78.7% | ||||
83.3% | 24.9% | 99.2% | 62.3% | |||
76.8% | 96.4% | 48.9% | ||||
49.2% | 2.7% | 32.3% | 89.3% | 31.0% |
API usage
Endpoint:
Model card
Architecture Overview:
• 1.6T total parameter MoE architecture with 49B parameters activated per token
• Hybrid attention combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for long-context efficiency
• At 1M token context: requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2
• Manifold-Constrained Hyper-Connections (mHC) strengthening residual connections for stable signal propagation
• 1M token context window
• Three reasoning modes: Non-think (fast responses), Think High (logical analysis), Think Max (maximum reasoning effort)
• FP4 + FP8 mixed precision (MoE expert parameters in FP4, other parameters in FP8)
Training Methodology:
• Pre-trained on more than 32T diverse and high-quality tokens
• Muon optimizer for faster convergence and greater training stability
• Two-stage post-training: independent cultivation of domain-specific experts through SFT and RL with GRPO, followed by unified model consolidation via on-policy distillation
• Integrates distinct proficiencies across diverse domains into a single model
Performance Characteristics:
• 93.5% LiveCodeBench, Codeforces rating 3206 for coding
• 90.1% GPQA Diamond, 95.2% HMMT 2026, 89.8% IMOAnswerBench for reasoning
• 80.6% SWE-Bench Verified, 76.2% SWE-Bench Multilingual, 55.4% SWE-Bench Pro for agentic coding
• 83.4% BrowseComp, 73.6% MCPAtlas Public, 51.8% Toolathlon for agentic tasks
• 83.5% MRCR 1M, 62.0% CorpusQA 1M for million-token comprehension
• 57.9% SimpleQA-Verified, 84.4% Chinese-SimpleQA for factual knowledge
• MIT licensed
Prompting
Together AI API Access:
• Access DeepSeek V4 Pro via Together AI APIs using the endpoint deepseek-ai/DeepSeek-V4-Pro
• Authenticate using your Together AI API key in request headers
• Supports three reasoning modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum reasoning effort
• For Think Max mode, recommended context window of at least 384K tokens
• Recommended parameters: temperature=1.0, top_p=1.0
• Available on serverless and monthly reserved infrastructure
Applications & use cases
Coding & Software Engineering:
• 93.5% LiveCodeBench and Codeforces 3206 for competitive and production code generation
• 80.6% SWE-Bench Verified for autonomous software engineering across repositories
• 76.2% SWE-Bench Multilingual for cross-language software engineering
• Terminal-Bench 2.0 and SWE-Bench Pro for complex systems engineering
Reasoning & Knowledge:
• Three configurable reasoning modes from fast intuitive responses to maximum effort reasoning
• 90.1% GPQA Diamond for scientific reasoning, 95.2% HMMT 2026 for mathematics
• 57.9% SimpleQA-Verified for factual knowledge
• 87.5% MMLU-Pro for general knowledge and understanding
Agentic Workflows & Long Context:
• 83.4% BrowseComp for agentic search, 73.6% MCPAtlas for tool orchestration
• 51.8% Toolathlon for multi-tool agentic tasks
• 1M token context with efficient hybrid attention for large codebases and documentation
• 83.5% MRCR 1M and 62.0% CorpusQA 1M for million-token comprehension
- TypeReasoningChatCodeLLM
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- IntelligenceHigh
- DeploymentServerlessMonthly Reserved
- Endpoint
- Parameters1.6T
- Activated parameters49B
- Context length512K
- Input price
$2.10 / 1M tokens
$0.20 (cached)/1M
- Output price
$4.40 / 1M tokens
- Input modalitiesText
- Output modalitiesText
- ReleasedApril 24, 2026
- Quantization levelFP4
- External link
- CategoryChat