MiniMax M3
MiniMax M3
MiniMax's frontier model. Released June 1, 2026. Novel MSA (MiniMax Sparse Attention) architecture enabling a 1M-token context window at production-viable compute. Parameter count not disclosed. Positions as the first open-weights model to combine frontier-level coding, 1M context, and native multimodal input (image + video) in a single system. Open weights and technical report promised within 10 days of API launch; training code not included in the open-source release.
Architecture
MiniMax Sparse Attention (MSA) — a two-stage sparse attention mechanism. A lightweight index branch first scans incoming tokens and selects relevant KV-cache blocks, then runs full attention only on those selected blocks. Compared to DSA and MoBA, MSA achieves higher effective context coverage via more precise KV partitioning. Operator-level implementation uses a "KV outer gather Q" approach, yielding >4× speedup over open-source Flash-Sparse-Attention and flash-moba.
At 1M-token context, MSA uses 1/20th the per-token compute of the prior generation (M2).
Benchmarks
| Benchmark | Score |
|---|---|
| SWE-Bench Pro | 59.0% |
| Terminal-Bench 2.1 | 66.0% |
| BrowseComp | 83.5 |
| MCP Atlas | 74.2% |
SWE-Bench Pro 59.0% beats GPT-5.5 and Gemini 3.1 Pro on that benchmark. BrowseComp 83.5 beats Claude Opus 4.7 (79.3). Terminal-Bench 2.1 at 66.0% is competitive with the frontier. SWE-bench scores are self-reported; independent verification pending.
Context Window
1M tokens.
Speed (vs. predecessor M2 at 1M context)
- Prefill: >9× faster
- Decoding: >15× faster
- Per-token compute: 1/20th
Multimodal
Native image and video input. Mixed-modality training from the start — not a post-hoc adapter. Also supports computer control/operation tasks.
Pricing (API)
| Tier | Rate |
|---|---|
| Input ≤512K tokens | Standard rate |
| Input >512K tokens | Higher rate |
| Priority tier | Available |
Subscription plans also available: Plus $20/month (~1.7B tokens), Max $50/month (~5.1B tokens), Ultra $120/month (~9.8B tokens). VentureBeat reported API cost at 5–10% of GPT-5.5 and Gemini 3.1 Pro for comparable tasks.
Availability
- API live June 1, 2026 (MiniMax platform)
- Open weights: Hugging Face and GitHub, promised within 10 days of June 1 launch
- Training code: not released (partial open-source)
- Technical report: released alongside weights
Strengths
- Long-context efficiency: 1M tokens at 1/20th the compute of M2 — practically usable at max context, not just technically supported.
- Coding benchmark: 59.0% SWE-Bench Pro beats GPT-5.5 and Gemini 3.1 Pro at launch.
- BrowseComp: 83.5 outpaces Opus 4.7 on autonomous browsing and retrieval.
- Cost: 5–10% of frontier closed models at comparable task performance per early reporting.
- Multimodal from training: native image/video, not bolted on.
Weaknesses
- Parameter count undisclosed — harder to reason about hardware requirements for self-hosting.
- Training code not open-sourced — limits reproducibility and fine-tuning research.
- Self-reported benchmarks; independent third-party replication not yet complete as of launch.
- No confirmed pricing per million tokens published at launch.
Use Cases
Best for: long-context coding tasks in agentic-workflows, autonomous browser agents (strong BrowseComp), cost-sensitive pipelines where frontier coding quality is needed without frontier pricing. Good fit alongside claude-code or similar orchestrators that can route long-context tasks here.
Not ideal for: math olympiad-style reasoning (see deepseek-r1 or microsoft-mai-thinking-1), tasks requiring confirmed open training code.
Related
agentic-workflows · evals · deepseek-r1 · microsoft-mai-thinking-1 · nvidia-nemotron-3-ultra · claude-opus-4-8