
Anthropic shipped Claude Opus 4.8 on May 28, 2026, six weeks after Claude Opus 4.7. Same $5 input, $25 output pricing as 4.7. A 1M token context window is now on by default, mid-conversation system messages went GA without a beta header, and Claude Code picked up a real Workflows primitive that lets one agent plan, fan out into hundreds of parallel subagents, and merge the result inside a single session.
This is the first Opus release that feels written for the people running agent fleets in production, not for chat. Below is what changed in plain language, what it means if you ship apps on top of Claude, and how I am thinking about the migration at Totalum (where every customer-facing build runs against Anthropic models).
Quick Answer
- What it is: Claude Opus 4.8 is Anthropic's newest flagship model, generally available May 28, 2026. Model ID on the API:
claude-opus-4-8. - Headline upgrade: 1M token context window by default, 128k max output tokens, adaptive thinking, mid-conversation system messages, and Workflows (planning plus parallel subagents) in Claude Code as a research preview.
- Pricing: $5 per million input tokens, $25 per million output tokens. Fast mode is $10 input, $50 output and runs about 2.5x faster. Same as Opus 4.7.
- Quality jump: Anthropic reports Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked," 84% on Online-Mind2Web, and the first model to break 10% on the Legal Agent Benchmark all-pass standard.
- Migration cost: Drop-in for most teams already on
claude-opus-4-7. Same tool surface, same parameter constraints. Test mid-conversation system messages and adaptive thinking before you change anything else. - Who should care most: Teams building long-running agents, code migration tools, document workflows over very large corpora, and any product that pays for thinking tokens. Effort defaults to high on Opus 4.8, so audit your effort settings before scaling up.
What is new in Claude Opus 4.8
Eleven things shipped on May 28, 2026 alongside the new model. Here are the ones that actually change how you build.
1. 1M token context window, default
On Opus 4.7 and 4.6, the 1M context window was a beta header you had to opt into, and it cost more above 200k tokens. On Opus 4.8 the 1M context window is on by default and ships with a 128k max output token cap. You stop paying the cost of writing context-handoff logic for long documents, large codebases, and dense legal or financial corpora. You still pay long-context input pricing on requests over 200k tokens, so do the math before you start dumping your entire monorepo into a single call.
2. Mid-conversation system messages, no beta header
The Messages API now accepts role: "system" entries at non-first positions in messages. You can inject new instructions partway through a long-running session and keep your prompt cache hits intact. This is one of the cleanest wins of the release for anyone running a chat-style agent with policy or persona that changes turn by turn. Previously you had to re-inject the entire system prompt or rebuild the conversation, and prompt cache would miss.
3. Adaptive thinking, by default
Opus 4.8 uses adaptive thinking to trigger reasoning only when a turn needs it. At the same effort level, Anthropic says Opus 4.8 wastes fewer thinking tokens than Opus 4.7. Combined with a high default effort, this is meant to net out as "smarter when it matters, cheaper when it doesn't." Validate this in your own logs before you assume the bill stays flat.
4. Refusal categories in stop_details
When Opus 4.8 declines a request, the Messages API now returns a refusal category in stop_details. You can route different classes of refusal to the right next step in your own product. For agent operators, this is the difference between a soft fallback (try a different tool) and a hard block (escalate to a human reviewer).
5. Lower prompt-cache threshold
The minimum cacheable prompt length on Opus 4.8 is 1,024 tokens, down from a higher floor on 4.7. Short repeating prompts now hit the cache, so RAG patterns with short instruction prefixes get the cache discount that used to apply only to long system prompts. If you're chaining tool calls with short prompts, this compounds.
6. Claude Code Workflows, research preview
This is the big one for builders. Inside Claude Code you can now define multi-step agentic plans and run them as Workflows. The same release notes describe "dynamic workflows" as letting "Claude can plan the work and then run hundreds of parallel subagents in a single session," which Anthropic is positioning at enterprise code migrations that touch hundreds of thousands of lines.
If you've been hand-rolling a planner-executor split with Claude Code or stitching together MCP servers, this is the primitive you were probably about to build.
7. Auto mode expanded, fast-mode default on Max
In Claude Code, Auto mode is rolling out to more users for long-running tasks, and Max plan users now default to fast mode on Opus 4.8. If you run an agency desk on Max seats, this changes the default cost profile of every interactive session that goes through Opus 4.8. Worth flagging in your team handbook.
8. High-resolution images, advisor, computer use, task budgets
Opus 4.8 supports the same high-resolution image input as 4.7 (up to 2576 pixels on the long edge), the advisor tool (pair a faster executor with a stronger advisor model), computer use, and task budgets. Nothing to do if you were already using these on 4.7; it just keeps working.
9. Fast mode for Opus 4.6 is being deprecated
Anthropic announced that fast mode for Claude Opus 4.6 will be removed approximately 30 days after the 4.8 launch. If you run any 4.6 fast mode workload, migrate to 4.7 or 4.8 fast mode before late June 2026.
10. Sampling parameters still locked
temperature, top_p, and top_k still return a 400 error when set to non-default values on Opus 4.8, same as on Opus 4.7. If your wrapper sets these, strip them out before the upgrade or you'll start seeing errors.
11. Same MCP, tools, and platform surface as Opus 4.7
The MCP connector, MCP tunnels (private network MCP servers, in research preview), Managed Agents, Skills, code execution, web search, web fetch, and the broader tool catalog all carry over. You do not have to rewire your tool layer to upgrade.
Benchmarks: what Anthropic published
The announcement leans on a small number of benchmark scores. The ones with absolute numbers worth quoting are:
- Online-Mind2Web: 84% per a customer testimonial in the launch post.
- Legal Agent Benchmark: first model to break 10% on the all-pass standard.
- Terminal-Bench 2.1: referenced with a footnote that GPT-5.5 Codex CLI scored 83.4%.
- OSWorld-Verified: cited with an updated Opus 4.7 baseline of 82.3% for comparison.
- Code review quality: "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked."
The pattern across these scores is consistent: agentic tasks (browser, code, legal workflows) improve materially. Pure-knowledge benchmarks barely move because the headroom is gone. If your product reads as agentic, you should see Opus 4.8 land better than 4.7. If your product is mostly retrieval and chat, the upgrade may be invisible to users.
For a wider view of where Opus 4.8 sits versus other agents, the running best AI coding agents in 2026 roundup tracks the live leaderboard.
Pricing: unchanged at the base tier
| Mode | Input | Output |
|---|---|---|
| Regular | $5 per million tokens | $25 per million tokens |
| Fast mode (research preview) | $10 per million tokens | $50 per million tokens |
Anthropic kept Opus 4.8 at the same pricing as Opus 4.7. Long-context input pricing still applies to requests over 200k tokens, so the bill grows if you actually feed it 1M context per call. This is the same dynamic that hit teams that opted into the 1M context beta on Opus 4.6 and 4.7; the only thing that changed is that you no longer need a beta header to trigger it.
One nuance worth flagging: because the effort parameter defaults to high on Opus 4.8 across Claude Code and the Messages API, and because adaptive thinking will spend thinking tokens when a turn needs it, the actual per-call cost in your dashboard can move even if the headline pricing did not. Run a representative sample before the next finance review.
Claude Opus 4.8 vs Claude Opus 4.7
Side by side, here is what actually changed for a developer.
| Capability | Opus 4.7 | Opus 4.8 |
|---|---|---|
| Model ID | claude-opus-4-7 |
claude-opus-4-8 |
| Default context | 200k | 1M |
| Max output tokens | 64k | 128k |
| Mid-conversation system messages | No | Yes, GA, no beta header |
| Effort parameter default | Configurable | Defaults to high |
| Adaptive thinking | Yes | Yes, with less wasted reasoning |
Refusal categories in stop_details |
No | Yes |
| Min cacheable prompt | Higher floor | 1,024 tokens |
| Claude Code Workflows | No | Yes, research preview |
| Pricing (regular) | $5 / $25 per million tokens | $5 / $25 per million tokens |
| Fast mode pricing | $10 / $50 per million tokens | $10 / $50 per million tokens |
| MCP, tools, Managed Agents | Full support | Full support |
| Sampling locked (temp, top_p, top_k) | Yes | Yes |
| High-resolution images (2576px) | Yes | Yes |
If you are running Cursor against Claude or routing through Claude Code, the IDE and CLI layer will keep working untouched. The model ID is the only required change.
What Opus 4.8 means if you ship on Claude
I run Totalum, an AI app builder that produces real Next.js applications with auth, payments, database, file storage, and deployment built in. Under the hood, every Totalum build pipeline calls Anthropic models. Here's how I'm reading Opus 4.8 from that seat.
Long-context is now table stakes
When 1M context becomes the default, "we can fit your codebase in the prompt" stops being a feature and starts being an assumption. Tooling that competes on context-management ceremony loses its moat. Tooling that competes on what you do with that context wins. Practically, this means agency operators can stop architecting around a 200k token ceiling and start asking the model to reason over the whole monorepo in a single call where the latency budget allows.
Workflows changes the shape of agency work
Claude Code Workflows lets you define a multi-step plan and execute it as parallel subagents inside one session. For an agency, this is the closest you have come to a real "ship feature X across all twelve repos" primitive that doesn't require you to build the planner yourself. If your team builds custom internal tools or client portals, expect the cost-per-feature curve to bend in the next quarter.
If you want to embed that capability into your own SaaS or client portal product, Totalum's MCP is one way to give Claude Code a production stack to ship into.
Mid-conversation system messages unlock policy-heavy agents
If you run customer support agents, compliance reviewers, or any product that adjusts persona by case, you previously paid prompt-cache misses every time you switched policy. Opus 4.8 removes that tax. Long-running agents that change instructions partway through a session now stay warm on cache.
Refusal categories tighten the safety net
Refusal categories in stop_details is the kind of detail that looks small until you ship a product to thousands of users. Routing soft refusals to a "try a different framing" branch and hard refusals to human review is now mechanical, not heuristic.
Skills, marketplaces, and ecosystem
Opus 4.8 keeps the full Claude Skills surface from prior releases. If your team has been packaging domain expertise as a Skill, nothing to migrate. If you have not started, the bar to do so has dropped now that adaptive thinking is doing more of the budgeting for you.
When to upgrade, when to wait
A short, opinionated take.
Upgrade now if you run long-context workloads, you ship agents that mutate their own instructions, you operate Claude Code at agency scale, or you pay for thinking tokens and want adaptive thinking turned on by default.
Wait a week or two if you depend on fast mode for Opus 4.6 (fast mode for 4.6 is being removed in roughly 30 days, so plan the migration rather than rushing), you have not yet audited your effort settings (the new high default can surprise you on cost), or you have heavy use of sampling parameters that some wrapper might be sending even though the API rejects them.
Stay put on Opus 4.7 for now if your product is mostly retrieval and short chat, the 1M default brings no benefit, and your team has not yet validated Workflows against your existing planner-executor stack. There is no rush; 4.7 keeps shipping.
How to switch
Three steps.
- Change your model ID from
claude-opus-4-7toclaude-opus-4-8. That's it for the minimum upgrade. - Audit your effort settings. Opus 4.8 defaults to
higheffort. If your dashboard breaks at high effort, set effort explicitly tomediumorlowto match your old behavior. - Pilot mid-conversation system messages and Workflows. Both unlock real product wins that pure 4.7 cannot reach. Pilot them on one workflow before you re-architect everything.
If you build client work or internal tools and you want a production stack ready to receive what Opus 4.8 ships out, book a Totalum call and we'll walk through the integration. If you are a solo developer wanting to try this on your own product, sign up at totalum.app and the same pipeline is yours.
FAQ
What is Claude Opus 4.8?
Claude Opus 4.8 is Anthropic's newest flagship model, released on May 28, 2026. It has the model ID claude-opus-4-8, a 1M token context window by default, 128k max output tokens, mid-conversation system messages without a beta header, adaptive thinking, and supports Claude Code Workflows as a research preview.
How much does Claude Opus 4.8 cost?
$5 per million input tokens and $25 per million output tokens at regular speed, same as Opus 4.7. Fast mode is a research preview at $10 input, $50 output, running about 2.5x faster. Long-context input pricing applies to requests over 200k tokens.
Is Claude Opus 4.8 a drop-in replacement for Opus 4.7?
For most teams, yes. The tool surface, MCP, Managed Agents, Skills, and sampling-parameter constraints are the same. The main behavioral differences are the default high effort, adaptive thinking trimming more reasoning, the new 1M default context, and refusal categories in stop_details. Audit your effort and refusal-handling code before you scale up traffic.
What is the 1M token context window in Opus 4.8?
A 1 million token input context is enabled by default on Opus 4.8 with a 128k max output token cap. On prior Opus models the 1M context was a beta header. Long-context input pricing applies to requests over 200k tokens.
What are Claude Code Workflows?
A research preview inside Claude Code where you define multi-step agentic plans. Claude plans the work, then runs many parallel subagents in a single session. Anthropic positions this at enterprise code migrations that touch very large codebases.
Is Opus 4.8 available on AWS and Microsoft Foundry?
The launch enables Opus 4.8 on the Claude API. Past Opus releases (4.5, 4.6, 4.7) propagated to Amazon Bedrock and Microsoft Foundry within weeks. Check the cloud provider docs before you assume parity for a brand-new release.
What about Claude Mythos?
On May 28, 2026 Anthropic also stated it expects to "bring Mythos-class models to all our customers in the coming weeks." Mythos remains gated to Project Glasswing for defensive cybersecurity research as of this writing. Opus 4.8 is the model you can use today.
Does Totalum already run on Claude Opus 4.8?
Totalum's build pipeline targets Claude under the hood. As Opus 4.8 propagates through our infrastructure, every new Totalum build benefits from the adaptive thinking and 1M context defaults without any change to how you use Totalum.
Should I move off Cursor or Codex?
Not on the strength of Opus 4.8 alone. The deciding factors are still the workflow shape and where your team lives day to day. The Cursor vs Claude Code in 2026 and Claude Code vs Codex in 2026 breakdowns lay out the trade-offs.
Author: Francesc, Co-founder at Totalum