Llama 3.1
Llama 3.1
Meta's July 2024 open-weights release. Three sizes (8B, 70B, 405B), all with a 128K context window — a major jump from Llama 3's 8K limit. The 405B was the first open-weight model to credibly close the gap with frontier closed models on coding and reasoning benchmarks. Licensed for commercial use under the Llama 3.1 Community License.
Release
July 23, 2024.
Model Sizes
| Size | Notes |
|---|---|
| 8B | Lightweight; suitable for local inference |
| 70B | Strong general-purpose open model |
| 405B | Frontier-class; trained on 16,000+ H100 GPUs over 15+ trillion tokens |
All three sizes use the same 128K context window. The 405B supports FP8 quantization, reducing memory requirements without major quality loss.
Context Window
128K tokens across all sizes. Llama 3 (predecessor) had an 8K window — 16x expansion.
Languages
Officially supported: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (8 languages). Training data covers a broader set, but these eight receive optimized instruction-tuning and safety coverage.
Benchmarks
| Benchmark | 405B Score |
|---|---|
| MMLU (5-shot) | 87.3% |
| MMLU (CoT) | 88.6% |
| HumanEval (coding) | 89.0% |
| MATH (0-shot CoT) | 73.8% |
| GPQA (graduate reasoning) | 50.7% |
For context: at launch, GPT-4o scored 90.2% HumanEval and Claude 3.5 Sonnet scored 92.0%. The 405B sits within a few points of frontier closed models on most axes — notable for a publicly released, self-hostable model. On GPQA, 50.7% matched Claude 3 Opus (50.4%) and exceeded GPT-4 Turbo (48.0%).
License
Llama 3.1 Community License. Permits commercial use, including using model outputs to train or improve other models — a notable expansion over earlier Llama licenses. Not fully open-source (weights can be distributed but the license restricts certain uses at scale).
Availability
- llama.meta.com / Hugging Face (weights download)
- Meta AI and WhatsApp (US, 405B via cloud)
- Cloud providers: AWS, Azure, Google Cloud, NVIDIA, Databricks, Groq, Snowflake, and 20+ additional partners
Safety Tooling
Shipped alongside Llama Guard 3 (input/output content classification) and Prompt Guard (prompt injection detection) — integrated safety layer designed for production deployments.
Successors
- Llama 3.2 — September 2024; added multimodal (vision) variants at 11B and 90B, plus 1B/3B edge models; same 8-language support
- Llama 3.3 — December 2024; 70B-only release with improved instruction-following
- Llama 4 — April 2025; natively multimodal, MoE architecture, 12 officially supported languages, Scout and Maverick variants