Claude Sonnet 4.6

Claude Sonnet 4.6

Anthropic's current workhorse for agentic coding tasks. Replaces Sonnet 3.7 as the default model in claude-code and the API. Sits in the middle tier of the Claude 4 family — faster and cheaper than claude-opus-4-6, meaningfully more capable than Haiku.

Benchmarks

Benchmark Score
SWE-bench Verified 79.6%
OSWorld 72.5%

SWE-bench measures ability to resolve real GitHub issues. 79.6% is near the top of the leaderboard for any model, including those 2-3x the cost. OSWorld measures computer-use — navigating a real desktop GUI — where 72.5% is best-in-class at launch.

Pricing

  • Input: $3 per million tokens
  • Output: $15 per million tokens

Competitive for the capability tier. At typical agentic task sizes (20-100K input tokens per run), cost is manageable for professional use. Becomes meaningful at scale.

Context Window

1M tokens.

Strengths

  • Coding and debugging: Top-tier on SWE-bench. Handles multi-file edits, test-driven debugging, and understanding existing codebases well.
  • Agentic use: Designed for long autonomous runs. Works well in multi-agent-setup and overnight-runs.
  • Extended thinking: Supports extended-thinking, which chains CoT reasoning with tool use. Supports both manual budget_tokens configuration and adaptive thinking. Activates automatically on hard problems.
  • Computer use: Strong OSWorld score makes it a solid default for GUI automation tasks via computer-use.

Weaknesses

  • Still hallucinates on APIs it hasn't seen recently. Always verify generated imports/function signatures.
  • Long context degrades somewhat past 100K tokens — not a deal-breaker but performance dips.
  • Output tokens at $15/M add up quickly on verbose tasks (log parsing, large diffs).

Use Cases

Best for: autonomous coding sessions in claude-code, computer-use automation, multi-agent pipelines that need reliable tool-use at reasonable cost.

Not ideal for: pure math reasoning (see deepseek-r1), very long document summarization (see gemini-2-5-pro), or quick one-off completions where cost matters most.

Related

claude-opus-4-6 · extended-thinking · computer-use · claude-code · evals

Sources