DeepSeek R1

2 min · model, open-source

DeepSeek R1

Open-weight reasoning model from DeepSeek, a Chinese AI lab. 671B total parameters (37B activated per token via MoE), MIT license. Caused significant market reaction at release because it matched or exceeded closed frontier models on math and reasoning benchmarks at a fraction of the training cost.

Key Specs

Parameters: 671B total (MoE architecture — 37B activated per token)
License: MIT — fully permissive, commercial use allowed
AIME 2024 Pass@1: 79.8%
MATH-500 Pass@1: 97.3%

Architecture Note

671B is a Mixture-of-Experts (MoE) model. Only 37B parameters activate for each token — effective compute per forward pass is much lower than a dense 671B model. This is how it achieves frontier-level performance at lower inference cost. Still requires significant infrastructure to run (multiple high-end GPUs or a dedicated inference cluster).

Benchmarks

MATH-500 at 97.3% is near-ceiling performance — most problems solved correctly. AIME 2024 at 79.8% (Pass@1) is competitive with closed frontier models. These scores established that comparable reasoning capability could come from a non-US lab at lower cost.

Distilled Variants

DeepSeek released distilled versions fine-tuned from R1 onto smaller base models:

DeepSeek-R1-Distill-Qwen-1.5B, 7B, 14B, 32B
DeepSeek-R1-Distill-Llama-8B, 70B

The 32B Qwen distill is particularly popular — runs on consumer hardware and retains much of the math reasoning capability. Good option via ollama or lm-studio.

Strengths

Best math reasoning per dollar in open-weight category
MIT license — minimal restrictions
Distilled variants make the capability accessible on modest hardware
Competitive with GPT-4-class on coding benchmarks

Weaknesses

Full 671B requires serious infrastructure
Safety fine-tuning reflects different standards than US frontier labs — relevant for some enterprise contexts
Less focus on agentic / tool-use than Claude or GPT
Chinese lab origin creates regulatory considerations for some use cases

Use Cases

Best for: math-heavy pipelines, theorem proving, scientific reasoning, financial modeling, any task where MATH-500-class reasoning is the bottleneck. Good distill options for local math assistance via ollama.

gemma-4 · ollama · evals · extended-thinking

Sources

linked from

Evals (Benchmarks)Extended Thinking Reasoning Models Claude Opus 4.8 Claude Sonnet 4.6 Gemma 4 GPT-5.4 Grok 4 Kimi K2 MiniMax M3 NVIDIA Nemotron 3 Ultra OpenAI o1 LM Studio Ollama

DeepSeek R1

DeepSeek R1

Key Specs

Architecture Note

Benchmarks

Distilled Variants

Strengths

Weaknesses

Use Cases

Related

Sources