DeepSeek R1
DeepSeek R1
Open-weight reasoning model from DeepSeek, a Chinese AI lab. 671B total parameters (37B activated per token via MoE), MIT license. Caused significant market reaction at release because it matched or exceeded closed frontier models on math and reasoning benchmarks at a fraction of the training cost.
Key Specs
- Parameters: 671B total (MoE architecture — 37B activated per token)
- License: MIT — fully permissive, commercial use allowed
- AIME 2024 Pass@1: 79.8%
- MATH-500 Pass@1: 97.3%
Architecture Note
671B is a Mixture-of-Experts (MoE) model. Only 37B parameters activate for each token — effective compute per forward pass is much lower than a dense 671B model. This is how it achieves frontier-level performance at lower inference cost. Still requires significant infrastructure to run (multiple high-end GPUs or a dedicated inference cluster).
Benchmarks
MATH-500 at 97.3% is near-ceiling performance — most problems solved correctly. AIME 2024 at 79.8% (Pass@1) is competitive with closed frontier models. These scores established that comparable reasoning capability could come from a non-US lab at lower cost.
Distilled Variants
DeepSeek released distilled versions fine-tuned from R1 onto smaller base models:
- DeepSeek-R1-Distill-Qwen-1.5B, 7B, 14B, 32B
- DeepSeek-R1-Distill-Llama-8B, 70B
The 32B Qwen distill is particularly popular — runs on consumer hardware and retains much of the math reasoning capability. Good option via ollama or lm-studio.
Strengths
- Best math reasoning per dollar in open-weight category
- MIT license — minimal restrictions
- Distilled variants make the capability accessible on modest hardware
- Competitive with GPT-4-class on coding benchmarks
Weaknesses
- Full 671B requires serious infrastructure
- Safety fine-tuning reflects different standards than US frontier labs — relevant for some enterprise contexts
- Less focus on agentic / tool-use than Claude or GPT
- Chinese lab origin creates regulatory considerations for some use cases
Use Cases
Best for: math-heavy pipelines, theorem proving, scientific reasoning, financial modeling, any task where MATH-500-class reasoning is the bottleneck. Good distill options for local math assistance via ollama.
Related
gemma-4 · ollama · evals · extended-thinking