Claude Sonnet 4.6
Claude Sonnet 4.6
Anthropic's current workhorse for agentic coding tasks. Replaces Sonnet 3.7 as the default model in claude-code and the API. Sits in the middle tier of the Claude 4 family — faster and cheaper than claude-opus-4-6, meaningfully more capable than Haiku.
Benchmarks
| Benchmark | Score |
|---|---|
| SWE-bench Verified | 79.6% |
| OSWorld | 72.5% |
SWE-bench measures ability to resolve real GitHub issues. 79.6% is near the top of the leaderboard for any model, including those 2-3x the cost. OSWorld measures computer-use — navigating a real desktop GUI — where 72.5% is best-in-class at launch.
Pricing
- Input: $3 per million tokens
- Output: $15 per million tokens
Competitive for the capability tier. At typical agentic task sizes (20-100K input tokens per run), cost is manageable for professional use. Becomes meaningful at scale.
Context Window
1M tokens.
Strengths
- Coding and debugging: Top-tier on SWE-bench. Handles multi-file edits, test-driven debugging, and understanding existing codebases well.
- Agentic use: Designed for long autonomous runs. Works well in multi-agent-setup and overnight-runs.
- Extended thinking: Supports extended-thinking, which chains CoT reasoning with tool use. Supports both manual
budget_tokensconfiguration and adaptive thinking. Activates automatically on hard problems. - Computer use: Strong OSWorld score makes it a solid default for GUI automation tasks via computer-use.
Weaknesses
- Still hallucinates on APIs it hasn't seen recently. Always verify generated imports/function signatures.
- Long context degrades somewhat past 100K tokens — not a deal-breaker but performance dips.
- Output tokens at $15/M add up quickly on verbose tasks (log parsing, large diffs).
Use Cases
Best for: autonomous coding sessions in claude-code, computer-use automation, multi-agent pipelines that need reliable tool-use at reasonable cost.
Not ideal for: pure math reasoning (see deepseek-r1), very long document summarization (see gemini-2-5-pro), or quick one-off completions where cost matters most.
Related
claude-opus-4-6 · extended-thinking · computer-use · claude-code · evals