MiniMax M3
Frontier coding and agentic capability with 1M context and native multimodality
About model
MiniMax M3 is MiniMax's frontier open-weight model combining coding and agentic capability, 1M token context, and native multimodality in a single checkpoint — the first open-weight model to bring all three together. It introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that reduces per-token compute to 1/20 of the previous generation at 1M context, delivering 9x prefilling and 15x decoding speedups. The model is natively multimodal from training step 0, supporting image and video input and computer use, with a toggleable thinking mode. It scores 59.0% on SWE-Bench Pro and 66.0% on Terminal-Bench 2.1, and is available on Together AI with a 1M token context window.
59.00%
Frontier coding across complex multi-file software engineering
1M
MSA at 1/20 compute per token vs previous generation
9x
Via MiniMax Sparse Attention with 15x decoding speedup
- Frontier coding & agentic: 59.0% SWE-Bench Pro and 66.0% Terminal-Bench 2.1 with 74.2% MCP Atlas for tool orchestration — covering complex multi-file engineering, long-horizon terminal execution, and autonomous task completion
- MSA long context: MiniMax Sparse Attention delivers 1M token context at 1/20 per-token compute vs the previous generation, with 9x prefilling and 15x decoding speedups and performance matching full attention on the vast majority of tasks
- Native multimodality: Trained with mixed modalities from step 0 for deeper semantic alignment across image and video input, with computer use capability for desktop automation
- Thinking mode: Toggleable at request time — enabled for complex reasoning and long-horizon agentic tasks, disabled for fast responses in latency-sensitive scenarios
Model card
Architecture overview:
• MSA (MiniMax Sparse Attention): new sparse attention architecture with block-level KV partitioning and KV-outer-gather-Q operator design
• At 1M context: per-token compute is 1/20 of previous generation; 9x prefilling and 15x decoding speedups vs full attention
• 4x faster than Flash-Sparse-Attention and flash-moba at M3's head configuration
• Native multimodal from training step 0 — image and video input with interleaved data scaling to 100 trillion tokens
• Toggleable thinking mode: on for complex reasoning, off for fast latency-sensitive responses
• Computer use capability for desktop automation across applications and file systems
• Open-weight; weights to be released approximately 10 days after launch
Training methodology:
• Native multimodal pretraining with interleaved text and image/video sequences from step 0
• Interactive user simulator framework for coding training: simulates real developer collaboration patterns including requirement elaboration, feedback correction, and multi-turn task iteration
• Scales coding training data to scenarios beyond single-turn task execution
Performance characteristics:
• Coding agents: 59.0% SWE-Bench Pro, 66.0% Terminal-Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard
• Agentic: 74.2% MCP Atlas, top score on Claw-Eval
• Multimodal: above Gemini 3.1 Pro on OmniDocBench
• Autonomous execution: 9.4x FP8 GEMM kernel speedup over 24 hours, 1,959 tool calls (MiniMax-reported)
• PostTrainBench: 0.37 for autonomous model training across 5 benchmarks
Prompting
Together AI API access:
• Access MiniMax M3 via Together AI APIs using the designated endpoint
• Authenticate using your Together AI API key in request headers
• Toggle thinking mode at request time — no separate model version required
• Supports image and video input alongside text for multimodal requests
• Available on Together AI on serverless and dedicated infrastructure
Applications & use cases
Coding & software engineering:
• Complex multi-file software engineering with 59.0% SWE-Bench Pro
• Long-horizon terminal execution with 66.0% Terminal-Bench 2.1
• CUDA kernel optimization, performance engineering, and low-level systems work
• Interactive multi-turn collaboration with context retained across the full session
Agentic & research workflows:
• Autonomous research tasks: paper reproduction, literature synthesis, and experiment design
• Tool orchestration with 74.2% MCP Atlas and top Claw-Eval performance
• Autonomous model training and evaluation pipelines without human intervention
• Long-horizon tasks sustained over multi-hour sessions using 1M context
Multimodal & computer use:
• Image and video input for document understanding, visual reasoning, and multimodal analysis
• Computer use for desktop automation across applications, files, and systems
• Visual coding: generating and editing interfaces from screenshots and visual references
Reasoning & long context:
• Toggleable thinking for complex reasoning tasks requiring deep analysis
• 1M token context for processing full codebases, research papers, and extended agent sessions
• Cross-document synthesis and long-context comprehension via MSA architecture
- Model providerMiniMax AI
- TypeReasoningVisionChat
- Main use casesReasoning
- FeaturesFunction CallingJSON Mode
- DeploymentServerlessOn-Demand Dedicated
- Context length1M
- Input price
$0.30 / 1M tokens
$0.06 (cached)/1M
- Output price
$1.20 / 1M tokens
- Input modalitiesTextImageVideo
- Output modalitiesText
- ReleasedMay 31, 2026
- CategoryChat