Claude Opus 4.8
Claude Opus 4.8
Anthropic's current flagship. Released May 28, 2026. Replaces claude-opus-4-6 as the top-tier model in the Claude 4 family. Built for long-horizon agentic work, complex reasoning, and high-autonomy coding tasks. claude-opus-4-6 is now legacy.
Benchmarks
| Benchmark | Score |
|---|---|
| OSWorld-Verified | 82.3% |
| Online-Mind2Web | 84% |
| Terminal-Bench 2.1 | outperforms GPT-5.5 (83.4%) |
| Finance Agent v2 | 57.9% |
| Legal Agent Benchmark | first model to break 10% on all-pass standard |
4× less likely to leave code flaws unmentioned compared to Opus 4.7. First model to complete every case end-to-end on the Super-Agent benchmark.
Pricing
- Input: $5 per million tokens
- Output: $25 per million tokens
Fast mode is available at 3× lower cost than the previous generation's fast tier ($10/MTok input, $50/MTok output at higher throughput).
Context Window
1M tokens. Max output: 128K tokens.
Thinking
Supports extended-thinking via adaptive thinking only — the model decides when and how deeply to think. Does not accept manual budget_tokens configuration (that parameter is supported on claude-sonnet-4-6 and legacy Opus models, not on 4.8). Defaults to effort: high on all surfaces including the API and claude-code; effort must be set explicitly to use a lower level.
Effort levels available on claude.ai and the Cowork surface: default (high), extra, and max.
Dynamic Workflows
In claude-code, Opus 4.8 can orchestrate hundreds of parallel subagents in a single session — planning the work, spawning agents, and running codebase-scale migrations across hundreds of thousands of lines from kickoff to merge. Currently in research preview; available on Enterprise, Team, and Max plans.
Strengths
- Long-horizon agentic coding: Orchestrator role in agentic-workflows, capable of directing large parallel workloads without hand-holding.
- Computer use: 82.3% OSWorld-Verified, 84% Online-Mind2Web — best-in-class for GUI automation tasks via computer-use.
- Code review reliability: Surfaces issues rather than silently skipping them; significant improvement over Opus 4.7 on unmentioned code flaws.
- Adaptive reasoning: Effort controls let you trade latency for quality at the task level.
Weaknesses
- No manual extended thinking (
budget_tokens) — if you need precise token-budget control over CoT, use claude-sonnet-4-6. - Higher cost than Sonnet tier; overkill for straightforward completions or high-volume pipelines.
- On Microsoft Foundry, context window is capped at 200K tokens (not the full 1M).
Use Cases
Best for: complex multi-step coding tasks in claude-code, orchestrating agentic-workflows, computer-use automation requiring sustained reasoning, legal/finance agent work.
Not ideal for: high-volume lightweight completions (use claude-sonnet-4-6), pure math reasoning (see deepseek-r1).
Availability
Claude API, AWS Bedrock, Google Vertex AI, Microsoft Foundry (Azure), claude.ai (all paid plans), claude-code, GitHub Copilot.
Related
claude-sonnet-4-6 · claude-opus-4-6 · extended-thinking · agentic-workflows · claude-code · computer-use · evals