Llama 3.1

Llama 3.1

Meta's July 2024 open-weights release. Three sizes (8B, 70B, 405B), all with a 128K context window — a major jump from Llama 3's 8K limit. The 405B was the first open-weight model to credibly close the gap with frontier closed models on coding and reasoning benchmarks. Licensed for commercial use under the Llama 3.1 Community License.

Release

July 23, 2024.

Model Sizes

Size Notes
8B Lightweight; suitable for local inference
70B Strong general-purpose open model
405B Frontier-class; trained on 16,000+ H100 GPUs over 15+ trillion tokens

All three sizes use the same 128K context window. The 405B supports FP8 quantization, reducing memory requirements without major quality loss.

Context Window

128K tokens across all sizes. Llama 3 (predecessor) had an 8K window — 16x expansion.

Languages

Officially supported: English, German, French, Italian, Portuguese, Hindi, Spanish, Thai (8 languages). Training data covers a broader set, but these eight receive optimized instruction-tuning and safety coverage.

Benchmarks

Benchmark 405B Score
MMLU (5-shot) 87.3%
MMLU (CoT) 88.6%
HumanEval (coding) 89.0%
MATH (0-shot CoT) 73.8%
GPQA (graduate reasoning) 50.7%

For context: at launch, GPT-4o scored 90.2% HumanEval and Claude 3.5 Sonnet scored 92.0%. The 405B sits within a few points of frontier closed models on most axes — notable for a publicly released, self-hostable model. On GPQA, 50.7% matched Claude 3 Opus (50.4%) and exceeded GPT-4 Turbo (48.0%).

License

Llama 3.1 Community License. Permits commercial use, including using model outputs to train or improve other models — a notable expansion over earlier Llama licenses. Not fully open-source (weights can be distributed but the license restricts certain uses at scale).

Availability

  • llama.meta.com / Hugging Face (weights download)
  • Meta AI and WhatsApp (US, 405B via cloud)
  • Cloud providers: AWS, Azure, Google Cloud, NVIDIA, Databricks, Groq, Snowflake, and 20+ additional partners

Safety Tooling

Shipped alongside Llama Guard 3 (input/output content classification) and Prompt Guard (prompt injection detection) — integrated safety layer designed for production deployments.

Successors

  • Llama 3.2 — September 2024; added multimodal (vision) variants at 11B and 90B, plus 1B/3B edge models; same 8-language support
  • Llama 3.3 — December 2024; 70B-only release with improved instruction-following
  • Llama 4 — April 2025; natively multimodal, MoE architecture, 12 officially supported languages, Scout and Maverick variants

Related

evals · ollama · lm-studio

Sources