Claude Sonnet 4.6

2 min · model, anthropic

Claude Sonnet 4.6

Superseded by claude-sonnet-5. Sonnet 4.6 is no longer the default in claude-code or the Anthropic API. Still available, but new builds should target Sonnet 5.

Anthropic's former workhorse for agentic coding tasks. Replaced Sonnet 3.7 as the default model in claude-code and the API. Sits in the middle tier of the Claude 4 family — faster and cheaper than claude-opus-4-6, meaningfully more capable than Haiku.

Benchmarks

Benchmark	Score
SWE-bench Verified	79.6%
OSWorld	72.5%

SWE-bench measures ability to resolve real GitHub issues. 79.6% is near the top of the leaderboard for any model, including those 2-3x the cost. OSWorld measures computer-use — navigating a real desktop GUI — where 72.5% is best-in-class at launch.

Pricing

Input: $3 per million tokens
Output: $15 per million tokens

Competitive for the capability tier. At typical agentic task sizes (20-100K input tokens per run), cost is manageable for professional use. Becomes meaningful at scale.

Context Window

1M tokens.

Strengths

Coding and debugging: Top-tier on SWE-bench. Handles multi-file edits, test-driven debugging, and understanding existing codebases well.
Agentic use: Designed for long autonomous runs. Works well in multi-agent-setup and overnight-runs.
Extended thinking: Supports extended-thinking, which chains CoT reasoning with tool use. Supports both manual budget_tokens configuration and adaptive thinking. Activates automatically on hard problems.
Computer use: Strong OSWorld score makes it a solid default for GUI automation tasks via computer-use.

Weaknesses

Still hallucinates on APIs it hasn't seen recently. Always verify generated imports/function signatures.
Long context degrades somewhat past 100K tokens — not a deal-breaker but performance dips.
Output tokens at $15/M add up quickly on verbose tasks (log parsing, large diffs).

Use Cases

Best for: autonomous coding sessions in claude-code, computer-use automation, multi-agent pipelines that need reliable tool-use at reasonable cost.

Not ideal for: pure math reasoning (see deepseek-r1), very long document summarization (see gemini-2-5-pro), or quick one-off completions where cost matters most.

claude-sonnet-5 · claude-opus-4-6 · extended-thinking · computer-use · claude-code · evals

Sources

linked from

Computer Use Evals (Benchmarks)Extended Thinking Vibe Coding Claude 3.5 Sonnet Claude 3.7 Sonnet Claude Haiku 4.5 Claude Opus 4.6 Claude Opus 4.8 Claude Sonnet 5 GPT-5.4 Grok 4 Microsoft MAI-Thinking-1 Codex CLI Multi-Agent Setup

Claude Sonnet 4.6

Claude Sonnet 4.6

Benchmarks

Pricing

Context Window

Strengths

Weaknesses

Use Cases

Related

Sources