Claude 3.7 Sonnet

2 min · model, anthropic, reasoning

Claude 3.7 Sonnet

Anthropic model released February 24, 2025. Historically significant as the first model to ship extended-thinking as a native capability — Anthropic's term for a "hybrid reasoning" architecture where a single model can produce standard responses or engage in an explicit, token-consuming chain-of-thought before answering.

Now superseded by the Claude Sonnet 4.x series.

What "Hybrid" Means

Prior reasoning models (o1, o3) were separate products with reasoning baked in as a fixed mode. Claude 3.7 Sonnet uses a single model that operates in two modes: standard (fast, no visible CoT) and extended thinking (explicit reasoning trace, user-configurable token budget). Developers set a budget_tokens parameter via the API to control how much reasoning to allocate. The model selects which mode to lean on based on task complexity, but the budget cap is always honored.

This lets the same model serve latency-sensitive requests (no thinking) and hard reasoning tasks (extended thinking) without switching to a separate endpoint or model version.

Benchmarks

Benchmark	Standard	Extended Thinking
SWE-bench Verified	62.3%	70.3%
GPQA	78.2%	84.8%
MMLU	86.1%	—

SWE-bench 70.3% significantly exceeded OpenAI o1 (48.9%) and DeepSeek-R1 (49.2%) at release. This was the clearest benchmark signal that extended thinking on a general-purpose model could outperform dedicated reasoning models on software engineering tasks.

Specifications

Context window: 200K tokens
Pricing at release: $3 / $15 per million input/output tokens (thinking tokens included in output cost)
Knowledge cutoff: October 2024

Comparison at Release (February 2025)

vs. o1: o1 had stronger math competition performance; Claude 3.7 led on SWE-bench and real-world agentic tasks by a wide margin
vs. Gemini 2.0 Pro: Claude 3.7 led on coding and instruction-following; Gemini 2.0 Pro led on long-context retrieval
vs. DeepSeek-R1: DeepSeek-R1 had stronger math benchmarks (MATH-500); Claude 3.7 led on SWE-bench and general coding

The model positioned itself not as a math/competition reasoning model but as a practical coding and agentic reasoning model — a different optimization target than o1 or R1.

Extended Thinking

Extended thinking made visible reasoning accessible without a separate model. Key properties:

Token budget is configurable: from a few hundred tokens (fast, focused) to tens of thousands (deep, complex)
Thinking tokens appear in the API response as a distinct block; the final answer is separate
Improves performance on math, science, and hard coding tasks; minimal gain on simple tasks
Not available on the free API tier at launch

First introduced in Claude 3.7; iterated in claude-sonnet-4-6 (adaptive thinking, automatic activation on hard problems).

extended-thinking · claude-sonnet-4-6 · evals · agentic-workflows

Sources

linked from

Claude 3.5 Sonnet

Claude 3.7 Sonnet

Claude 3.7 Sonnet

What "Hybrid" Means

Benchmarks

Specifications

Comparison at Release (February 2025)

Extended Thinking

Related

Sources