Llama 4

3 min · model, meta, open-weights

Llama 4

Meta's April 2025 open-weights release. First Llama generation to use a Mixture-of-Experts (MoE) architecture and native multimodality via early fusion. Two variants available for download (Scout and Maverick); a third (Behemoth) was previewed but not released. Succeeds llama-3-1.

Release

April 5, 2025.

Variants

Model	Active Params	Total Params	Experts	Context Window
Llama 4 Scout	17B	109B	16	10M tokens
Llama 4 Maverick	17B	400B	128	Not published
Llama 4 Behemoth	288B	~2T	16	Preview only

Scout fits on a single NVIDIA H100 with int4 quantization. Trained on ~40 trillion tokens. Targets ultra-long context use cases.

Maverick runs on a single H100 DGX host (8× H100). Trained on ~22 trillion tokens. Higher total parameter count translates to stronger general-purpose and coding performance. Achieved ELO 1417 on LMArena in experimental chat mode.

Behemoth is a "teacher model" used for codistillation into Scout and Maverick. Still in training as of the April release; not publicly available.

Architecture

Both Scout and Maverick use:

Mixture-of-Experts (MoE) — first in the Llama family. Only a subset of experts activates per token, reducing compute per forward pass relative to total parameter count.
Early fusion for multimodality — vision and language processed jointly from the start rather than as separate towers.
iRoPE architecture — interleaved attention layers without positional embeddings, enabling Scout's 10M-token context window.

Input modalities: text and images. Output: text and code. Supported languages: Arabic, English, French, German, Hindi, Indonesian, Italian, Portuguese, Spanish, Tagalog, Thai, Vietnamese.

Benchmarks

Benchmark	Maverick	Scout
MMMU (image reasoning)	73.4%	69.4%
LiveCodeBench (coding)	43.4%	32.8%
MMLU Pro (reasoning)	80.5%	74.3%

Maverick beats GPT-4o and Gemini 2.0 Flash across a broad benchmark set while matching DeepSeek v3 on reasoning and coding at less than half the active parameters. Scout outperforms Gemma 3 and Gemini 2.0 Flash-Lite in its class and leads on long-context tasks.

Behemoth (preview, not released) outperforms GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro on MATH-500 and GPQA Diamond.

License

Llama 4 Community License Agreement (effective April 5, 2025). Custom commercial license — not fully open-source. Commercial use permitted; restrictions apply at scale. Available at github.com/meta-llama/llama-models.

Availability

Weights: llama.com and Hugging Face
Meta AI surfaces: WhatsApp, Messenger, Instagram Direct, meta.ai
API via waitlist: llama.developer.meta.com
Third-party providers: AWS, IBM watsonx.ai, and others

Use Cases

Scout: long-document analysis, retrieval over large corpora, tasks requiring extreme context depth (10M tokens).

Maverick: general-purpose reasoning, coding, multimodal tasks; drop-in replacement for GPT-4o class workloads at open-weights cost.

llama-3-1 · ollama · lm-studio · evals · agentic-workflows

Llama 4

Llama 4

Release

Variants

Architecture

Benchmarks

License

Availability

Use Cases

Related

Sources