MiniMax M3

MiniMax M3

MiniMax's frontier model. Released June 1, 2026. Novel MSA (MiniMax Sparse Attention) architecture enabling a 1M-token context window at production-viable compute. Parameter count not disclosed. Positions as the first open-weights model to combine frontier-level coding, 1M context, and native multimodal input (image + video) in a single system. Open weights and technical report promised within 10 days of API launch; training code not included in the open-source release.

Architecture

MiniMax Sparse Attention (MSA) — a two-stage sparse attention mechanism. A lightweight index branch first scans incoming tokens and selects relevant KV-cache blocks, then runs full attention only on those selected blocks. Compared to DSA and MoBA, MSA achieves higher effective context coverage via more precise KV partitioning. Operator-level implementation uses a "KV outer gather Q" approach, yielding >4× speedup over open-source Flash-Sparse-Attention and flash-moba.

At 1M-token context, MSA uses 1/20th the per-token compute of the prior generation (M2).

Benchmarks

Benchmark Score
SWE-Bench Pro 59.0%
Terminal-Bench 2.1 66.0%
BrowseComp 83.5
MCP Atlas 74.2%

SWE-Bench Pro 59.0% beats GPT-5.5 and Gemini 3.1 Pro on that benchmark. BrowseComp 83.5 beats Claude Opus 4.7 (79.3). Terminal-Bench 2.1 at 66.0% is competitive with the frontier. SWE-bench scores are self-reported; independent verification pending.

Context Window

1M tokens.

Speed (vs. predecessor M2 at 1M context)

  • Prefill: >9× faster
  • Decoding: >15× faster
  • Per-token compute: 1/20th

Multimodal

Native image and video input. Mixed-modality training from the start — not a post-hoc adapter. Also supports computer control/operation tasks.

Pricing (API)

Tier Rate
Input ≤512K tokens Standard rate
Input >512K tokens Higher rate
Priority tier Available

Subscription plans also available: Plus $20/month (~1.7B tokens), Max $50/month (~5.1B tokens), Ultra $120/month (~9.8B tokens). VentureBeat reported API cost at 5–10% of GPT-5.5 and Gemini 3.1 Pro for comparable tasks.

Availability

  • API live June 1, 2026 (MiniMax platform)
  • Open weights: Hugging Face and GitHub, promised within 10 days of June 1 launch
  • Training code: not released (partial open-source)
  • Technical report: released alongside weights

Strengths

  • Long-context efficiency: 1M tokens at 1/20th the compute of M2 — practically usable at max context, not just technically supported.
  • Coding benchmark: 59.0% SWE-Bench Pro beats GPT-5.5 and Gemini 3.1 Pro at launch.
  • BrowseComp: 83.5 outpaces Opus 4.7 on autonomous browsing and retrieval.
  • Cost: 5–10% of frontier closed models at comparable task performance per early reporting.
  • Multimodal from training: native image/video, not bolted on.

Weaknesses

  • Parameter count undisclosed — harder to reason about hardware requirements for self-hosting.
  • Training code not open-sourced — limits reproducibility and fine-tuning research.
  • Self-reported benchmarks; independent third-party replication not yet complete as of launch.
  • No confirmed pricing per million tokens published at launch.

Use Cases

Best for: long-context coding tasks in agentic-workflows, autonomous browser agents (strong BrowseComp), cost-sensitive pipelines where frontier coding quality is needed without frontier pricing. Good fit alongside claude-code or similar orchestrators that can route long-context tasks here.

Not ideal for: math olympiad-style reasoning (see deepseek-r1 or microsoft-mai-thinking-1), tasks requiring confirmed open training code.

Related

agentic-workflows · evals · deepseek-r1 · microsoft-mai-thinking-1 · nvidia-nemotron-3-ultra · claude-opus-4-8

Sources