Extended Thinking

Extended Thinking

Chain-of-thought reasoning interleaved with tool use. First shipped in Claude 3.7 Sonnet, now a standard feature of the Claude 4 family. The model thinks through a problem step-by-step in a scratchpad that isn't part of the final output, then uses that reasoning to inform tool calls and responses.

What It Actually Is

Extended thinking is not just "the model thinks longer." The key mechanism is interleaving:

  1. Model reasons about what to do
  2. Model calls a tool (search, code execution, file read, etc.)
  3. Tool returns results
  4. Model reasons about the results
  5. Repeat until task is done
  6. Model produces final output

Without extended thinking, tool calls happen more reactively. With it, the model has a dedicated reasoning space to plan multi-step tool use sequences before committing to them. This reduces the rate of "wrong tool call → cascading error" failures in complex agentic tasks.

When It Activates

In claude-code, extended thinking activates automatically based on task complexity heuristics. You can also trigger it explicitly via API with the thinking parameter.

Model-specific behavior differs:

  • claude-sonnet-4-6: Supports both manual budget_tokens and adaptive thinking.
  • claude-opus-4-6: Supports adaptive thinking only — manual budget_tokens is not available.

Via API (Sonnet 4.6, manual budget):

response = client.messages.create(
    model="claude-sonnet-4-6",
    thinking={"type": "enabled", "budget_tokens": 10000},
    ...
)

Via API (adaptive, works on both models):

response = client.messages.create(
    model="claude-opus-4-6",
    thinking={"type": "adaptive"},
    ...
)

Impact on Benchmarks

Extended thinking is part of what drives claude-sonnet-4-6's SWE-bench score. The ability to reason about a failing test, plan a fix, execute it, observe the result, and re-reason is qualitatively different from single-shot code generation.

On math benchmarks, extended thinking-enabled models consistently outperform their non-thinking versions — sometimes by 15-30 percentage points on hard problems.

Cost Implications

Thinking tokens are billed even when the scratchpad is not surfaced in the final output. A task with budget_tokens: 10000 can add significant cost if the model fills the budget. In practice, the model uses what it needs — complex tasks use more, simple tasks less. Monitor thinking token usage if cost is a concern.

Thinking vs. Non-Thinking

Use extended thinking for: debugging complex issues, multi-step architecture decisions, math-heavy tasks, agentic-workflows where wrong moves are costly.

Skip extended thinking for: simple completions, formatting tasks, quick lookups — the overhead isn't worth it.

Related

claude-sonnet-4-6 · claude-opus-4-6 · agentic-workflows · evals · deepseek-r1

Sources