Llama 3.1

2 min · model, meta, open-weights

Llama 3.1

Meta's July 2024 open-weights release. Three sizes (8B, 70B, 405B), all with a 128K context window — a major jump from Llama 3's 8K limit. The 405B was the first open-weight model to credibly close the gap with frontier closed models on coding and reasoning benchmarks. Licensed for commercial use under the Llama 3.1 Community License.

Release

July 23, 2024.

Model Sizes

Size	Notes
8B	Lightweight; suitable for local inference
70B	Strong general-purpose open model
405B	Frontier-class; trained on 16,000+ H100 GPUs over 15+ trillion tokens

All three sizes use the same 128K context window. The 405B supports FP8 quantization, reducing memory requirements without major quality loss.

Context Window

128K tokens across all sizes. Llama 3 (predecessor) had an 8K window — 16x expansion.

Languages

Officially supported: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (8 languages). Training data covers a broader set, but these eight receive optimized instruction-tuning and safety coverage.

Benchmarks

Benchmark	405B Score
MMLU (5-shot)	87.3%
MMLU (CoT)	88.6%
HumanEval (coding)	89.0%
MATH (0-shot CoT)	73.8%
GPQA (graduate reasoning)	50.7%

For context: at launch, GPT-4o scored 90.2% HumanEval and Claude 3.5 Sonnet scored 92.0%. The 405B sits within a few points of frontier closed models on most axes — notable for a publicly released, self-hostable model. On GPQA, 50.7% matched Claude 3 Opus (50.4%) and exceeded GPT-4 Turbo (48.0%).

License

Llama 3.1 Community License. Permits commercial use, including using model outputs to train or improve other models — a notable expansion over earlier Llama licenses. Not fully open-source (weights can be distributed but the license restricts certain uses at scale).

Availability

llama.meta.com / Hugging Face (weights download)
Meta AI and WhatsApp (US, 405B via cloud)
Cloud providers: AWS, Azure, Google Cloud, NVIDIA, Databricks, Groq, Snowflake, and 20+ additional partners

Safety Tooling

Shipped alongside Llama Guard 3 (input/output content classification) and Prompt Guard (prompt injection detection) — integrated safety layer designed for production deployments.

Successors

Llama 3.2 — September 2024; added multimodal (vision) variants at 11B and 90B, plus 1B/3B edge models; same 8-language support
Llama 3.3 — December 2024; 70B-only release with improved instruction-following
Llama 4 — April 2025; natively multimodal, MoE architecture, 12 officially supported languages, Scout and Maverick variants

evals · ollama · lm-studio

Sources

linked from

Llama 4

Llama 3.1

Llama 3.1

Release

Model Sizes

Context Window

Languages

Benchmarks

License

Availability

Safety Tooling

Successors

Related

Sources