MiniMax M3

Frontier coding and agentic capability with 1M context and native multimodality

Try now

read docs

About model

MiniMax M3 is MiniMax's frontier open-weight model combining coding and agentic capability, 1M token context, and native multimodality in a single checkpoint — the first open-weight model to bring all three together. It introduces MSA (MiniMax Sparse Attention), a new sparse attention architecture that reduces per-token compute to 1/20 of the previous generation at 1M context, delivering 9x prefilling and 15x decoding speedups. The model is natively multimodal from training step 0, supporting image and video input and computer use, with a toggleable thinking mode. It scores 59.0% on SWE-Bench Pro and 66.0% on Terminal-Bench 2.1, and is available on Together AI with a 1M token context window.

SWE-Bench Pro

59.00%

Frontier coding across complex multi-file software engineering

Context Window

MSA at 1/20 compute per token vs previous generation

Prefilling Speedup

Via MiniMax Sparse Attention with 15x decoding speedup

Model key capabilities

Frontier coding & agentic: 59.0% SWE-Bench Pro and 66.0% Terminal-Bench 2.1 with 74.2% MCP Atlas for tool orchestration — covering complex multi-file engineering, long-horizon terminal execution, and autonomous task completion
MSA long context: MiniMax Sparse Attention delivers 1M token context at 1/20 per-token compute vs the previous generation, with 9x prefilling and 15x decoding speedups and performance matching full attention on the vast majority of tasks
Native multimodality: Trained with mixed modalities from step 0 for deeper semantic alignment across image and video input, with computer use capability for desktop automation
Thinking mode: Toggleable at request time — enabled for complex reasoning and long-horizon agentic tasks, disabled for fast responses in latency-sensitive scenarios

Model card
Architecture overview:
• MSA (MiniMax Sparse Attention): new sparse attention architecture with block-level KV partitioning and KV-outer-gather-Q operator design
• At 1M context: per-token compute is 1/20 of previous generation; 9x prefilling and 15x decoding speedups vs full attention
• 4x faster than Flash-Sparse-Attention and flash-moba at M3's head configuration
• Native multimodal from training step 0 — image and video input with interleaved data scaling to 100 trillion tokens
• Toggleable thinking mode: on for complex reasoning, off for fast latency-sensitive responses
• Computer use capability for desktop automation across applications and file systems
• Open-weight; weights to be released approximately 10 days after launch

Training methodology:
• Native multimodal pretraining with interleaved text and image/video sequences from step 0
• Interactive user simulator framework for coding training: simulates real developer collaboration patterns including requirement elaboration, feedback correction, and multi-turn task iteration
• Scales coding training data to scenarios beyond single-turn task execution

Performance characteristics:
• Coding agents: 59.0% SWE-Bench Pro, 66.0% Terminal-Bench 2.1, 34.8% SWE-fficiency, 28.8% KernelBench Hard
• Agentic: 74.2% MCP Atlas, top score on Claw-Eval
• Multimodal: above Gemini 3.1 Pro on OmniDocBench
• Autonomous execution: 9.4x FP8 GEMM kernel speedup over 24 hours, 1,959 tool calls (MiniMax-reported)
• PostTrainBench: 0.37 for autonomous model training across 5 benchmarks
‍
Prompting
Together AI API access:
• Access MiniMax M3 via Together AI APIs using the designated endpoint
• Authenticate using your Together AI API key in request headers
• Toggle thinking mode at request time — no separate model version required
• Supports image and video input alongside text for multimodal requests
• Available on Together AI on serverless and dedicated infrastructure
‍
Applications & use cases
Coding & software engineering:
• Complex multi-file software engineering with 59.0% SWE-Bench Pro
• Long-horizon terminal execution with 66.0% Terminal-Bench 2.1
• CUDA kernel optimization, performance engineering, and low-level systems work
• Interactive multi-turn collaboration with context retained across the full session

Agentic & research workflows:
• Autonomous research tasks: paper reproduction, literature synthesis, and experiment design
• Tool orchestration with 74.2% MCP Atlas and top Claw-Eval performance
• Autonomous model training and evaluation pipelines without human intervention
• Long-horizon tasks sustained over multi-hour sessions using 1M context

Multimodal & computer use:
• Image and video input for document understanding, visual reasoning, and multimodal analysis
• Computer use for desktop automation across applications, files, and systems
• Visual coding: generating and editing interfaces from screenshots and visual references

Reasoning & long context:
• Toggleable thinking for complex reasoning tasks requiring deep analysis
• 1M token context for processing full codebases, research papers, and extended agent sessions
• Cross-document synthesis and long-context comprehension via MSA architecture
‍

Related models

Model specifications

Model data

Model provider
MiniMax AI
Type
Reasoning
Vision
Chat
Main use cases
Reasoning
Features
Function Calling
JSON Mode
Deployment
Serverless
On-Demand Dedicated
Context length
1M
Input price
$0.30 / 1M tokens
$0.06 (cached)/1M
Output price
$1.20 / 1M tokens
Input modalities
Text
Image
Video
Output modalities
Text

Released
May 31, 2026
Category
Chat

Quickstart docs

Deploy model

MiniMax M3

About model

Model card

Prompting

Applications & use cases