Claude Opus 4.8

Claude Opus 4.8

Anthropic's current flagship. Released May 28, 2026. Replaces claude-opus-4-6 as the top-tier model in the Claude 4 family. Built for long-horizon agentic work, complex reasoning, and high-autonomy coding tasks. claude-opus-4-6 is now legacy.

Benchmarks

Benchmark Score
OSWorld-Verified 82.3%
Online-Mind2Web 84%
Terminal-Bench 2.1 outperforms GPT-5.5 (83.4%)
Finance Agent v2 57.9%
Legal Agent Benchmark first model to break 10% on all-pass standard

4× less likely to leave code flaws unmentioned compared to Opus 4.7. First model to complete every case end-to-end on the Super-Agent benchmark.

Pricing

  • Input: $5 per million tokens
  • Output: $25 per million tokens

Fast mode is available at 3× lower cost than the previous generation's fast tier ($10/MTok input, $50/MTok output at higher throughput).

Context Window

1M tokens. Max output: 128K tokens.

Thinking

Supports extended-thinking via adaptive thinking only — the model decides when and how deeply to think. Does not accept manual budget_tokens configuration (that parameter is supported on claude-sonnet-4-6 and legacy Opus models, not on 4.8). Defaults to effort: high on all surfaces including the API and claude-code; effort must be set explicitly to use a lower level.

Effort levels available on claude.ai and the Cowork surface: default (high), extra, and max.

Dynamic Workflows

In claude-code, Opus 4.8 can orchestrate hundreds of parallel subagents in a single session — planning the work, spawning agents, and running codebase-scale migrations across hundreds of thousands of lines from kickoff to merge. Currently in research preview; available on Enterprise, Team, and Max plans.

Strengths

  • Long-horizon agentic coding: Orchestrator role in agentic-workflows, capable of directing large parallel workloads without hand-holding.
  • Computer use: 82.3% OSWorld-Verified, 84% Online-Mind2Web — best-in-class for GUI automation tasks via computer-use.
  • Code review reliability: Surfaces issues rather than silently skipping them; significant improvement over Opus 4.7 on unmentioned code flaws.
  • Adaptive reasoning: Effort controls let you trade latency for quality at the task level.

Weaknesses

  • No manual extended thinking (budget_tokens) — if you need precise token-budget control over CoT, use claude-sonnet-4-6.
  • Higher cost than Sonnet tier; overkill for straightforward completions or high-volume pipelines.
  • On Microsoft Foundry, context window is capped at 200K tokens (not the full 1M).

Use Cases

Best for: complex multi-step coding tasks in claude-code, orchestrating agentic-workflows, computer-use automation requiring sustained reasoning, legal/finance agent work.

Not ideal for: high-volume lightweight completions (use claude-sonnet-4-6), pure math reasoning (see deepseek-r1).

Availability

Claude API, AWS Bedrock, Google Vertex AI, Microsoft Foundry (Azure), claude.ai (all paid plans), claude-code, GitHub Copilot.

Related

claude-sonnet-4-6 · claude-opus-4-6 · extended-thinking · agentic-workflows · claude-code · computer-use · evals

Sources