Models / MiniMax AI
Reasoning
Vision
Chat

MiniMax M3

Frontier coding and agentic capability with 1M context and native multimodality

About model

MiniMax M3 is MiniMax's frontier open-weight model combining coding and agentic capability, 1M token context, and native multimodality in a single checkpoint — the first open-weight model to bring all three together. It introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that reduces per-token compute to 1/20 of the previous generation at 1M context, delivering 9x prefilling and 15x decoding speedups. The model is natively multimodal from training step 0, supporting image and video input and computer use, with a toggleable thinking mode. It scores 59.0% on SWE-Bench Pro and 66.0% on Terminal-Bench 2.1, and is available on Together AI with a 1M token context window.

SWE-Bench Pro

59.00%

Frontier coding across complex multi-file software engineering

Context Window

1M

MSA at 1/20 compute per token vs previous generation

Prefilling Speedup

9x

Via MiniMax Sparse Attention with 15x decoding speedup

Model key capabilities
  • Frontier coding & agentic: 59.0% SWE-Bench Pro and 66.0% Terminal-Bench 2.1 with 74.2% MCP Atlas for tool orchestration — covering complex multi-file engineering, long-horizon terminal execution, and autonomous task completion
  • MSA long context: MiniMax Sparse Attention delivers 1M token context at 1/20 per-token compute vs the previous generation, with 9x prefilling and 15x decoding speedups and performance matching full attention on the vast majority of tasks
  • Native multimodality: Trained with mixed modalities from step 0 for deeper semantic alignment across image and video input, with computer use capability for desktop automation
  • Thinking mode: Toggleable at request time — enabled for complex reasoning and long-horizon agentic tasks, disabled for fast responses in latency-sensitive scenarios
  • Model card

    Architecture overview:
    • MSA (MiniMax Sparse Attention): new sparse attention architecture with block-level KV partitioning and KV-outer-gather-Q operator design
    • At 1M context: per-token compute is 1/20 of previous generation; 9x prefilling and 15x decoding speedups vs full attention
    • 4x faster than Flash-Sparse-Attention and flash-moba at M3's head configuration
    • Native multimodal from training step 0 — image and video input with interleaved data scaling to 100 trillion tokens
    • Toggleable thinking mode: on for complex reasoning, off for fast latency-sensitive responses
    • Computer use capability for desktop automation across applications and file systems
    • Open-weight; weights to be released approximately 10 days after launch

    Training methodology:
    • Native multimodal pretraining with interleaved text and image/video sequences from step 0
    • Interactive user simulator framework for coding training: simulates real developer collaboration patterns including requirement elaboration, feedback correction, and multi-turn task iteration
    • Scales coding training data to scenarios beyond single-turn task execution

    Performance characteristics:
    • Coding agents: 59.0% SWE-Bench Pro, 66.0% Terminal-Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard
    • Agentic: 74.2% MCP Atlas, top score on Claw-Eval
    • Multimodal: above Gemini 3.1 Pro on OmniDocBench
    • Autonomous execution: 9.4x FP8 GEMM kernel speedup over 24 hours, 1,959 tool calls (MiniMax-reported)
    • PostTrainBench: 0.37 for autonomous model training across 5 benchmarks

  • Prompting

    Together AI API access:
    • Access MiniMax M3 via Together AI APIs using the designated endpoint
    • Authenticate using your Together AI API key in request headers
    • Toggle thinking mode at request time — no separate model version required
    • Supports image and video input alongside text for multimodal requests
    • Available on Together AI on serverless and dedicated infrastructure

  • Applications & use cases

    Coding & software engineering:
    • Complex multi-file software engineering with 59.0% SWE-Bench Pro
    • Long-horizon terminal execution with 66.0% Terminal-Bench 2.1
    • CUDA kernel optimization, performance engineering, and low-level systems work
    • Interactive multi-turn collaboration with context retained across the full session

    Agentic & research workflows:
    • Autonomous research tasks: paper reproduction, literature synthesis, and experiment design
    • Tool orchestration with 74.2% MCP Atlas and top Claw-Eval performance
    • Autonomous model training and evaluation pipelines without human intervention
    • Long-horizon tasks sustained over multi-hour sessions using 1M context

    Multimodal & computer use:
    • Image and video input for document understanding, visual reasoning, and multimodal analysis
    • Computer use for desktop automation across applications, files, and systems
    • Visual coding: generating and editing interfaces from screenshots and visual references

    Reasoning & long context:
    • Toggleable thinking for complex reasoning tasks requiring deep analysis
    • 1M token context for processing full codebases, research papers, and extended agent sessions
    • Cross-document synthesis and long-context comprehension via MSA architecture

Related models
  • Model provider
    MiniMax AI
  • Type
    Reasoning
    Vision
    Chat
  • Main use cases
    Reasoning
  • Features
    Function Calling
    JSON Mode
  • Deployment
    Serverless
    On-Demand Dedicated
  • Context length
    1M
  • Input price

    $0.30 / 1M tokens

    $0.06 (cached)/1M

  • Output price

    $1.20 / 1M tokens

  • Input modalities
    Text
    Image
    Video
  • Output modalities
    Text
  • Released
    May 31, 2026
  • Category
    Chat