Claude 3.7 Sonnet
Claude 3.7 Sonnet
Anthropic model released February 24, 2025. Historically significant as the first model to ship extended-thinking as a native capability — Anthropic's term for a "hybrid reasoning" architecture where a single model can produce standard responses or engage in an explicit, token-consuming chain-of-thought before answering.
Now superseded by the Claude Sonnet 4.x series.
What "Hybrid" Means
Prior reasoning models (o1, o3) were separate products with reasoning baked in as a fixed mode. Claude 3.7 Sonnet uses a single model that operates in two modes: standard (fast, no visible CoT) and extended thinking (explicit reasoning trace, user-configurable token budget). Developers set a budget_tokens parameter via the API to control how much reasoning to allocate. The model selects which mode to lean on based on task complexity, but the budget cap is always honored.
This lets the same model serve latency-sensitive requests (no thinking) and hard reasoning tasks (extended thinking) without switching to a separate endpoint or model version.
Benchmarks
| Benchmark | Standard | Extended Thinking |
|---|---|---|
| SWE-bench Verified | 62.3% | 70.3% |
| GPQA | 78.2% | 84.8% |
| MMLU | 86.1% | — |
SWE-bench 70.3% significantly exceeded OpenAI o1 (48.9%) and DeepSeek-R1 (49.2%) at release. This was the clearest benchmark signal that extended thinking on a general-purpose model could outperform dedicated reasoning models on software engineering tasks.
Specifications
- Context window: 200K tokens
- Pricing at release: $3 / $15 per million input/output tokens (thinking tokens included in output cost)
- Knowledge cutoff: October 2024
Comparison at Release (February 2025)
- vs. o1: o1 had stronger math competition performance; Claude 3.7 led on SWE-bench and real-world agentic tasks by a wide margin
- vs. Gemini 2.0 Pro: Claude 3.7 led on coding and instruction-following; Gemini 2.0 Pro led on long-context retrieval
- vs. DeepSeek-R1: DeepSeek-R1 had stronger math benchmarks (MATH-500); Claude 3.7 led on SWE-bench and general coding
The model positioned itself not as a math/competition reasoning model but as a practical coding and agentic reasoning model — a different optimization target than o1 or R1.
Extended Thinking
Extended thinking made visible reasoning accessible without a separate model. Key properties:
- Token budget is configurable: from a few hundred tokens (fast, focused) to tens of thousands (deep, complex)
- Thinking tokens appear in the API response as a distinct block; the final answer is separate
- Improves performance on math, science, and hard coding tasks; minimal gain on simple tasks
- Not available on the free API tier at launch
First introduced in Claude 3.7; iterated in claude-sonnet-4-6 (adaptive thinking, automatic activation on hard problems).
Related
extended-thinking · claude-sonnet-4-6 · evals · agentic-workflows