Llama 4

Llama 4

Meta's April 2025 open-weights release. First Llama generation to use a Mixture-of-Experts (MoE) architecture and native multimodality via early fusion. Two variants available for download (Scout and Maverick); a third (Behemoth) was previewed but not released. Succeeds llama-3-1.

Release

April 5, 2025.

Variants

Model Active Params Total Params Experts Context Window
Llama 4 Scout 17B 109B 16 10M tokens
Llama 4 Maverick 17B 400B 128 Not published
Llama 4 Behemoth 288B ~2T 16 Preview only

Scout fits on a single NVIDIA H100 with int4 quantization. Trained on ~40 trillion tokens. Targets ultra-long context use cases.

Maverick runs on a single H100 DGX host (8× H100). Trained on ~22 trillion tokens. Higher total parameter count translates to stronger general-purpose and coding performance. Achieved ELO 1417 on LMArena in experimental chat mode.

Behemoth is a "teacher model" used for codistillation into Scout and Maverick. Still in training as of the April release; not publicly available.

Architecture

Both Scout and Maverick use:

  • Mixture-of-Experts (MoE) — first in the Llama family. Only a subset of experts activates per token, reducing compute per forward pass relative to total parameter count.
  • Early fusion for multimodality — vision and language processed jointly from the start rather than as separate towers.
  • iRoPE architecture — interleaved attention layers without positional embeddings, enabling Scout's 10M-token context window.

Input modalities: text and images. Output: text and code. Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese.

Benchmarks

Benchmark Maverick Scout
MMMU (image reasoning) 73.4% 69.4%
LiveCodeBench (coding) 43.4% 32.8%
MMLU Pro (reasoning) 80.5% 74.3%

Maverick beats GPT-4o and Gemini 2.0 Flash across a broad benchmark set while matching DeepSeek v3 on reasoning and coding at less than half the active parameters. Scout outperforms Gemma 3 and Gemini 2.0 Flash-Lite in its class and leads on long-context tasks.

Behemoth (preview, not released) outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on MATH-500 and GPQA Diamond.

License

Llama 4 Community License Agreement (effective April 5, 2025). Custom commercial license — not fully open-source. Commercial use permitted; restrictions apply at scale. Available at github.com/meta-llama/llama-models.

Availability

  • Weights: llama.com and Hugging Face
  • Meta AI surfaces: WhatsApp, Messenger, Instagram Direct, meta.ai
  • API via waitlist: llama.developer.meta.com
  • Third-party providers: AWS, IBM watsonx.ai, and others

Use Cases

Scout: long-document analysis, retrieval over large corpora, tasks requiring extreme context depth (10M tokens).

Maverick: general-purpose reasoning, coding, multimodal tasks; drop-in replacement for GPT-4o class workloads at open-weights cost.

Related

llama-3-1 · ollama · lm-studio · evals · agentic-workflows

Sources