DeepSeek R1

DeepSeek R1

Open-weight reasoning model from DeepSeek, a Chinese AI lab. 671B total parameters (37B activated per token via MoE), MIT license. Caused significant market reaction at release because it matched or exceeded closed frontier models on math and reasoning benchmarks at a fraction of the training cost.

Key Specs

  • Parameters: 671B total (MoE architecture — 37B activated per token)
  • License: MIT — fully permissive, commercial use allowed
  • AIME 2024 Pass@1: 79.8%
  • MATH-500 Pass@1: 97.3%

Architecture Note

671B is a Mixture-of-Experts (MoE) model. Only 37B parameters activate for each token — effective compute per forward pass is much lower than a dense 671B model. This is how it achieves frontier-level performance at lower inference cost. Still requires significant infrastructure to run (multiple high-end GPUs or a dedicated inference cluster).

Benchmarks

MATH-500 at 97.3% is near-ceiling performance — most problems solved correctly. AIME 2024 at 79.8% (Pass@1) is competitive with closed frontier models. These scores established that comparable reasoning capability could come from a non-US lab at lower cost.

Distilled Variants

DeepSeek released distilled versions fine-tuned from R1 onto smaller base models:

  • DeepSeek-R1-Distill-Qwen-1.5B, 7B, 14B, 32B
  • DeepSeek-R1-Distill-Llama-8B, 70B

The 32B Qwen distill is particularly popular — runs on consumer hardware and retains much of the math reasoning capability. Good option via ollama or lm-studio.

Strengths

  • Best math reasoning per dollar in open-weight category
  • MIT license — minimal restrictions
  • Distilled variants make the capability accessible on modest hardware
  • Competitive with GPT-4-class on coding benchmarks

Weaknesses

  • Full 671B requires serious infrastructure
  • Safety fine-tuning reflects different standards than US frontier labs — relevant for some enterprise contexts
  • Less focus on agentic / tool-use than Claude or GPT
  • Chinese lab origin creates regulatory considerations for some use cases

Use Cases

Best for: math-heavy pipelines, theorem proving, scientific reasoning, financial modeling, any task where MATH-500-class reasoning is the bottleneck. Good distill options for local math assistance via ollama.

Related

gemma-4 · ollama · evals · extended-thinking

Sources