Llama 4
Llama 4
Meta's April 2025 open-weights release. First Llama generation to use a Mixture-of-Experts (MoE) architecture and native multimodality via early fusion. Two variants available for download (Scout and Maverick); a third (Behemoth) was previewed but not released. Succeeds llama-3-1.
Release
April 5, 2025.
Variants
| Model | Active Params | Total Params | Experts | Context Window |
|---|---|---|---|---|
| Llama 4 Scout | 17B | 109B | 16 | 10M tokens |
| Llama 4 Maverick | 17B | 400B | 128 | Not published |
| Llama 4 Behemoth | 288B | ~2T | 16 | Preview only |
Scout fits on a single NVIDIA H100 with int4 quantization. Trained on ~40 trillion tokens. Targets ultra-long context use cases.
Maverick runs on a single H100 DGX host (8× H100). Trained on ~22 trillion tokens. Higher total parameter count translates to stronger general-purpose and coding performance. Achieved ELO 1417 on LMArena in experimental chat mode.
Behemoth is a "teacher model" used for codistillation into Scout and Maverick. Still in training as of the April release; not publicly available.
Architecture
Both Scout and Maverick use:
- Mixture-of-Experts (MoE) — first in the Llama family. Only a subset of experts activates per token, reducing compute per forward pass relative to total parameter count.
- Early fusion for multimodality — vision and language processed jointly from the start rather than as separate towers.
- iRoPE architecture — interleaved attention layers without positional embeddings, enabling Scout's 10M-token context window.
Input modalities: text and images. Output: text and code. Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese.
Benchmarks
| Benchmark | Maverick | Scout |
|---|---|---|
| MMMU (image reasoning) | 73.4% | 69.4% |
| LiveCodeBench (coding) | 43.4% | 32.8% |
| MMLU Pro (reasoning) | 80.5% | 74.3% |
Maverick beats GPT-4o and Gemini 2.0 Flash across a broad benchmark set while matching DeepSeek v3 on reasoning and coding at less than half the active parameters. Scout outperforms Gemma 3 and Gemini 2.0 Flash-Lite in its class and leads on long-context tasks.
Behemoth (preview, not released) outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on MATH-500 and GPQA Diamond.
License
Llama 4 Community License Agreement (effective April 5, 2025). Custom commercial license — not fully open-source. Commercial use permitted; restrictions apply at scale. Available at github.com/meta-llama/llama-models.
Availability
- Weights: llama.com and Hugging Face
- Meta AI surfaces: WhatsApp, Messenger, Instagram Direct, meta.ai
- API via waitlist: llama.developer.meta.com
- Third-party providers: AWS, IBM watsonx.ai, and others
Use Cases
Scout: long-document analysis, retrieval over large corpora, tasks requiring extreme context depth (10M tokens).
Maverick: general-purpose reasoning, coding, multimodal tasks; drop-in replacement for GPT-4o class workloads at open-weights cost.
Related
llama-3-1 · ollama · lm-studio · evals · agentic-workflows