Models / DeepSeek
Reasoning
Chat
Code
LLM

DeepSeek V4 Pro

Million-token context intelligence with hybrid attention and three reasoning modes

About model

DeepSeek V4 Pro is DeepSeek's 1.6T parameter (49B activated) MoE model supporting 1M token context. It introduces a hybrid attention architecture combining Compressed Sparse Attention and Heavily Compressed Attention, requiring only 27% of inference FLOPs and 10% of KV cache compared to V3.2 at million-token context. Pre-trained on 32T+ tokens with Muon optimizer and a two-stage post-training pipeline, V4 Pro delivers three configurable reasoning modes and strong performance across coding (93.5% LiveCodeBench), reasoning (90.1% GPQA Diamond), and agentic tasks (80.6% SWE-Bench Verified). MIT licensed.

Token Context

1M

27% FLOPs and 10% KV cache vs V3.2 at 1M context

LiveCodeBench

93.50%

Codeforces rating 3206 for competitive coding

SWE-Bench Verified

80.60%

Agentic coding across repositories

Model key capabilities
  • Million-Token Efficiency: Hybrid CSA+HCA attention requiring only 27% of inference FLOPs and 10% of KV cache vs V3.2 at 1M context, with 83.5% MRCR 1M comprehension
  • Three Reasoning Modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum effort with 90.1% GPQA Diamond and 95.2% HMMT 2026
  • Coding & Agentic Leadership: 93.5% LiveCodeBench, Codeforces 3206, 80.6% SWE-Bench Verified, and 73.6% MCPAtlas for tool orchestration
  • Open-Source at Scale: 1.6T parameters (49B activated) pre-trained on 32T+ tokens, MIT licensed, with domain-expert post-training via SFT, RL, and on-policy distillation
Performance benchmarks

Model

AIME 2025

GPQA Diamond

HLE

LiveCodeBench

MATH500

SWE-bench verified

93.50%

80.60%

Related open-source models

Competitor closed-source models

Claude Opus 4.6

90.5%

34.2%

78.7%

OpenAI o3

83.3%

24.9%

99.2%

62.3%

OpenAI o1

76.8%

96.4%

48.9%

GPT-4o

49.2%

2.7%

32.3%

89.3%

31.0%

  • API usage

    • cURL
    • Python
    • Typescript

    Endpoint:

    deepseek-ai/DeepSeek-V4-Pro

    curl -X POST "https://api.together.xyz/v1/chat/completions" \
      -H "Authorization: Bearer $TOGETHER_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "model": "deepseek-ai/DeepSeek-V4-Pro",
        "messages": [
          {
            "role": "user",
            "content": "What are some fun things to do in New York?"
          }
        ]
    }'
    
    from together import Together
    
    client = Together()
    
    response = client.chat.completions.create(
      model="deepseek-ai/DeepSeek-V4-Pro",
      messages=[
        {
          "role": "user",
          "content": "What are some fun things to do in New York?"
        }
      ]
    )
    print(response.choices[0].message.content)
    
    import Together from 'together-ai';
    const together = new Together();
    
    const completion = await together.chat.completions.create({
      model: 'deepseek-ai/DeepSeek-V4-Pro',
      messages: [
        {
          role: 'user',
          content: 'What are some fun things to do in New York?'
         }
      ],
    });
    
    console.log(completion.choices[0].message.content);
    
  • Model card

    Architecture Overview:
    • 1.6T total parameter MoE architecture with 49B parameters activated per token
    • Hybrid attention combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) for long-context efficiency
    • At 1M token context: requires only 27% of single-token inference FLOPs and 10% of KV cache compared to DeepSeek-V3.2
    • Manifold-Constrained Hyper-Connections (mHC) strengthening residual connections for stable signal propagation
    • 1M token context window
    • Three reasoning modes: Non-think (fast responses), Think High (logical analysis), Think Max (maximum reasoning effort)
    • FP4 + FP8 mixed precision (MoE expert parameters in FP4, other parameters in FP8)

    Training Methodology:
    • Pre-trained on more than 32T diverse and high-quality tokens
    • Muon optimizer for faster convergence and greater training stability
    • Two-stage post-training: independent cultivation of domain-specific experts through SFT and RL with GRPO, followed by unified model consolidation via on-policy distillation
    • Integrates distinct proficiencies across diverse domains into a single model

    Performance Characteristics:
    • 93.5% LiveCodeBench, Codeforces rating 3206 for coding
    • 90.1% GPQA Diamond, 95.2% HMMT 2026, 89.8% IMOAnswerBench for reasoning
    • 80.6% SWE-Bench Verified, 76.2% SWE-Bench Multilingual, 55.4% SWE-Bench Pro for agentic coding
    • 83.4% BrowseComp, 73.6% MCPAtlas Public, 51.8% Toolathlon for agentic tasks
    • 83.5% MRCR 1M, 62.0% CorpusQA 1M for million-token comprehension
    • 57.9% SimpleQA-Verified, 84.4% Chinese-SimpleQA for factual knowledge
    • MIT licensed

  • Prompting

    Together AI API Access:
    • Access DeepSeek V4 Pro via Together AI APIs using the endpoint deepseek-ai/DeepSeek-V4-Pro
    • Authenticate using your Together AI API key in request headers
    • Supports three reasoning modes: Non-think for fast responses, Think High for logical analysis, Think Max for maximum reasoning effort
    • For Think Max mode, recommended context window of at least 384K tokens
    • Recommended parameters: temperature=1.0, top_p=1.0
    • Available on serverless and monthly reserved infrastructure

  • Applications & use cases

    Coding & Software Engineering:
    • 93.5% LiveCodeBench and Codeforces 3206 for competitive and production code generation
    • 80.6% SWE-Bench Verified for autonomous software engineering across repositories
    • 76.2% SWE-Bench Multilingual for cross-language software engineering
    • Terminal-Bench 2.0 and SWE-Bench Pro for complex systems engineering

    Reasoning & Knowledge:
    • Three configurable reasoning modes from fast intuitive responses to maximum effort reasoning
    • 90.1% GPQA Diamond for scientific reasoning, 95.2% HMMT 2026 for mathematics
    • 57.9% SimpleQA-Verified for factual knowledge
    • 87.5% MMLU-Pro for general knowledge and understanding

    Agentic Workflows & Long Context:
    • 83.4% BrowseComp for agentic search, 73.6% MCPAtlas for tool orchestration
    • 51.8% Toolathlon for multi-tool agentic tasks
    • 1M token context with efficient hybrid attention for large codebases and documentation
    • 83.5% MRCR 1M and 62.0% CorpusQA 1M for million-token comprehension

Related models
  • Model provider
    DeepSeek
  • Type
    Reasoning
    Chat
    Code
    LLM
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Intelligence
    High
  • Deployment
    Serverless
    Monthly Reserved
  • Parameters
    1.6T
  • Activated parameters
    49B
  • Context length
    512K
  • Input price

    $2.10 / 1M tokens

    $0.20 (cached)/1M

  • Output price

    $4.40 / 1M tokens

  • Input modalities
    Text
  • Output modalities
    Text
  • Released
    April 24, 2026
  • Quantization level
    FP4
  • External link
  • Category
    Chat