AI Coding Agents

Claude Opus 4.8 Pricing 2026: Is It Worth It vs 4.7 and Sonnet 5?

FrancescJuly 13, 202618 min read

Claude Opus 4.8 hero illustration

Updated July 13, 2026. Claude Opus 4.8 pricing is still $5 input and $25 output per million tokens, unchanged since launch. What moved in mid-2026 is the competition, not the rate card: Claude Sonnet 5 and GPT-5.6 Sol now deliver near-Opus quality on many tasks at a fraction of the cost, so the real question is no longer "what does Opus 4.8 cost" but "is Opus 4.8 still worth it." Short answer: yes for long-context and long-horizon agentic work, and increasingly no for everyday chat and retrieval, where a cheaper tier now does the job. The full reasoning, effective-cost math, and money-saving levers are below.

For the workload-by-workload breakdown of when DeepSeek V4 Pro overtakes Opus 4.8 on cost, see our DeepSeek V4 Pro vs Claude 2026 comparison.

Anthropic shipped Claude Opus 4.8 on May 28, 2026, six weeks after Claude Opus 4.7. Same $5 input, $25 output pricing as 4.7. A 1M token context window is now on by default, mid-conversation system messages went GA without a beta header, and Claude Code picked up a real Workflows primitive that lets one agent plan, fan out into hundreds of parallel subagents, and merge the result inside a single session.

This is the first Opus release that feels written for the people running agent fleets in production, not for chat. Below is what changed in plain language, what it means if you ship apps on top of Claude, and how I am thinking about the migration at Totalum (where every customer-facing build runs against Anthropic models). Anthropic is also formalizing how services partners build with this model: see our breakdown of the Claude Partner Network Services Track announced June 3, 2026.

Quick Answer

What it is: Claude Opus 4.8 is Anthropic's newest flagship model, generally available May 28, 2026. Model ID on the API: claude-opus-4-8.
Headline upgrade: 1M token context window by default, 128k max output tokens, adaptive thinking, mid-conversation system messages, and Workflows (planning plus parallel subagents) in Claude Code as a research preview.
Pricing: $5 per million input tokens, $25 per million output tokens. Fast mode is $10 input, $50 output and runs about 2.5x faster. Same as Opus 4.7.
Quality jump: Anthropic reports Opus 4.8 is "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked," 84% on Online-Mind2Web, and the first model to break 10% on the Legal Agent Benchmark all-pass standard.
Migration cost: Drop-in for most teams already on claude-opus-4-7. Same tool surface, same parameter constraints. Test mid-conversation system messages and adaptive thinking before you change anything else.
Who should care most: Teams building long-running agents, code migration tools, document workflows over very large corpora, and any product that pays for thinking tokens. Effort defaults to high on Opus 4.8, so audit your effort settings before scaling up.

> Note, June 13, 2026: Claude Opus 4.8 is now the closest capability-matched replacement for Claude Fable 5, after Fable 5 was globally suspended on June 12 by a US government export control directive. If you wired Fable 5 into a production agent, the fastest fallback is to swap the model identifier to claude-opus-4-8 and keep moving. The full incident-response checklist is in the suspension guide.

What is new in Claude Opus 4.8

Eleven things shipped on May 28, 2026 alongside the new model. Here are the ones that actually change how you build.

1. 1M token context window, default

On Opus 4.7 and 4.6, the 1M context window was a beta header you had to opt into, and it cost more above 200k tokens. On Opus 4.8 the 1M context window is on by default and ships with a 128k max output token cap. You stop paying the cost of writing context-handoff logic for long documents, large codebases, and dense legal or financial corpora. You still pay long-context input pricing on requests over 200k tokens, so do the math before you start dumping your entire monorepo into a single call.

2. Mid-conversation system messages, no beta header

The Messages API now accepts role: "system" entries at non-first positions in messages. You can inject new instructions partway through a long-running session and keep your prompt cache hits intact. This is one of the cleanest wins of the release for anyone running a chat-style agent with policy or persona that changes turn by turn. Previously you had to re-inject the entire system prompt or rebuild the conversation, and prompt cache would miss.

3. Adaptive thinking, by default

Opus 4.8 uses adaptive thinking to trigger reasoning only when a turn needs it. At the same effort level, Anthropic says Opus 4.8 wastes fewer thinking tokens than Opus 4.7. Combined with a high default effort, this is meant to net out as "smarter when it matters, cheaper when it doesn't." Validate this in your own logs before you assume the bill stays flat.

4. Refusal categories in `stop_details`

When Opus 4.8 declines a request, the Messages API now returns a refusal category in stop_details. You can route different classes of refusal to the right next step in your own product. For agent operators, this is the difference between a soft fallback (try a different tool) and a hard block (escalate to a human reviewer).

5. Lower prompt-cache threshold

The minimum cacheable prompt length on Opus 4.8 is 1,024 tokens, down from a higher floor on 4.7. Short repeating prompts now hit the cache, so RAG patterns with short instruction prefixes get the cache discount that used to apply only to long system prompts. If you're chaining tool calls with short prompts, this compounds.

6. Claude Code Workflows, research preview

This is the big one for builders. Inside Claude Code you can now define multi-step agentic plans and run them as Workflows. The same release notes describe "dynamic workflows" as letting "Claude can plan the work and then run hundreds of parallel subagents in a single session," which Anthropic is positioning at enterprise code migrations that touch hundreds of thousands of lines.

If you've been hand-rolling a planner-executor split with Claude Code or stitching together MCP servers, this is the primitive you were probably about to build.

7. Auto mode expanded, fast-mode default on Max

In Claude Code, Auto mode is rolling out to more users for long-running tasks, and Max plan users now default to fast mode on Opus 4.8. If you run an agency desk on Max seats, this changes the default cost profile of every interactive session that goes through Opus 4.8. Worth flagging in your team handbook.

8. High-resolution images, advisor, computer use, task budgets

Opus 4.8 supports the same high-resolution image input as 4.7 (up to 2576 pixels on the long edge), the advisor tool (pair a faster executor with a stronger advisor model), computer use, and task budgets. Nothing to do if you were already using these on 4.7; it just keeps working.

9. Fast mode for Opus 4.6 is being deprecated

Anthropic announced that fast mode for Claude Opus 4.6 will be removed approximately 30 days after the 4.8 launch. If you run any 4.6 fast mode workload, migrate to 4.7 or 4.8 fast mode before late June 2026.

10. Sampling parameters still locked

temperature, top_p, and top_k still return a 400 error when set to non-default values on Opus 4.8, same as on Opus 4.7. If your wrapper sets these, strip them out before the upgrade or you'll start seeing errors.

11. Same MCP, tools, and platform surface as Opus 4.7

The MCP connector, MCP tunnels (private network MCP servers, in research preview), Managed Agents, Skills, code execution, web search, web fetch, and the broader tool catalog all carry over. You do not have to rewire your tool layer to upgrade.

Benchmarks: what Anthropic published

The announcement leans on a small number of benchmark scores. The ones with absolute numbers worth quoting are:

Online-Mind2Web: 84% per a customer testimonial in the launch post.
Legal Agent Benchmark: first model to break 10% on the all-pass standard.
Terminal-Bench 2.1: referenced with a footnote that GPT-5.5 Codex CLI scored 83.4%.
OSWorld-Verified: cited with an updated Opus 4.7 baseline of 82.3% for comparison.
Code review quality: "around four times less likely than its predecessor to allow flaws in code it has written to pass unremarked."

The pattern across these scores is consistent: agentic tasks (browser, code, legal workflows) improve materially. Pure-knowledge benchmarks barely move because the headroom is gone. If your product reads as agentic, you should see Opus 4.8 land better than 4.7. If your product is mostly retrieval and chat, the upgrade may be invisible to users.

For a wider view of where Opus 4.8 sits versus other agents, the running best AI coding agents in 2026 roundup tracks the live leaderboard.

Pricing: unchanged at the base tier

Mode	Input	Output
Regular	$5 per million tokens	$25 per million tokens
Fast mode (research preview)	$10 per million tokens	$50 per million tokens
Batch API (async, about 50% off)	$2.50 per million tokens	$12.50 per million tokens
Prompt caching, 5-minute (write / read)	$6.25 write, $0.50 read per million tokens	$25 per million tokens

Anthropic kept Opus 4.8 at the same pricing as Opus 4.7. Long-context input pricing still applies to requests over 200k tokens, so the bill grows if you actually feed it 1M context per call. This is the same dynamic that hit teams that opted into the 1M context beta on Opus 4.6 and 4.7; the only thing that changed is that you no longer need a beta header to trigger it. Two line items the headline rate card hides can move your real bill more than the $5 / $25 figure: the Batch API runs async jobs at roughly half price ($2.50 input, $12.50 output per million tokens), and prompt caching drops repeated input to $0.50 per million tokens on cache reads (5-minute cache writes are $6.25). If your workload is bursty or replays a large system prompt, those two levers matter more than the headline rate.

One nuance worth flagging: because the effort parameter defaults to high on Opus 4.8 across Claude Code and the Messages API, and because adaptive thinking will spend thinking tokens when a turn needs it, the actual per-call cost in your dashboard can move even if the headline pricing did not. Run a representative sample before the next finance review.

Claude Opus 4.8 vs Claude Opus 4.7

Side by side, here is what actually changed for a developer.

Capability	Opus 4.7	Opus 4.8
Model ID	`claude-opus-4-7`	`claude-opus-4-8`
Default context	200k	1M
Max output tokens	64k	128k
Mid-conversation system messages	No	Yes, GA, no beta header
Effort parameter default	Configurable	Defaults to `high`
Adaptive thinking	Yes	Yes, with less wasted reasoning
Refusal categories in `stop_details`	No	Yes
Min cacheable prompt	Higher floor	1,024 tokens
Claude Code Workflows	No	Yes, research preview
Pricing (regular)	$5 / $25 per million tokens	$5 / $25 per million tokens
Fast mode pricing	$10 / $50 per million tokens	$10 / $50 per million tokens
MCP, tools, Managed Agents	Full support	Full support
Sampling locked (temp, top_p, top_k)	Yes	Yes
High-resolution images (2576px)	Yes	Yes

If you are running Cursor against Claude or routing through Claude Code, the IDE and CLI layer will keep working untouched. The model ID is the only required change.

What Opus 4.8 means if you ship on Claude

I run Totalum, an AI app builder that produces real Next.js applications with auth, payments, database, file storage, and deployment built in. Under the hood, every Totalum build pipeline calls Anthropic models. Here's how I'm reading Opus 4.8 from that seat.

Long-context is now table stakes

When 1M context becomes the default, "we can fit your codebase in the prompt" stops being a feature and starts being an assumption. Tooling that competes on context-management ceremony loses its moat. Tooling that competes on what you do with that context wins. Practically, this means agency operators can stop architecting around a 200k token ceiling and start asking the model to reason over the whole monorepo in a single call where the latency budget allows.

Workflows changes the shape of agency work

Claude Code Workflows lets you define a multi-step plan and execute it as parallel subagents inside one session. For an agency, this is the closest you have come to a real "ship feature X across all twelve repos" primitive that doesn't require you to build the planner yourself. If your team builds custom internal tools or client portals, expect the cost-per-feature curve to bend in the next quarter.

If you want to embed that capability into your own SaaS or client portal product, Totalum's MCP is one way to give Claude Code a production stack to ship into.

Mid-conversation system messages unlock policy-heavy agents

If you run customer support agents, compliance reviewers, or any product that adjusts persona by case, you previously paid prompt-cache misses every time you switched policy. Opus 4.8 removes that tax. Long-running agents that change instructions partway through a session now stay warm on cache.

Refusal categories tighten the safety net

Refusal categories in stop_details is the kind of detail that looks small until you ship a product to thousands of users. Routing soft refusals to a "try a different framing" branch and hard refusals to human review is now mechanical, not heuristic.

Skills, marketplaces, and ecosystem

Opus 4.8 keeps the full Claude Skills surface from prior releases. If your team has been packaging domain expertise as a Skill, nothing to migrate. If you have not started, the bar to do so has dropped now that adaptive thinking is doing more of the budgeting for you.

When to upgrade, when to wait

A short, opinionated take.

Upgrade now if you run long-context workloads, you ship agents that mutate their own instructions, you operate Claude Code at agency scale, or you pay for thinking tokens and want adaptive thinking turned on by default.

Wait a week or two if you depend on fast mode for Opus 4.6 (fast mode for 4.6 is being removed in roughly 30 days, so plan the migration rather than rushing), you have not yet audited your effort settings (the new high default can surprise you on cost), or you have heavy use of sampling parameters that some wrapper might be sending even though the API rejects them.

Stay put on Opus 4.7 for now if your product is mostly retrieval and short chat, the 1M default brings no benefit, and your team has not yet validated Workflows against your existing planner-executor stack. There is no rush; 4.7 keeps shipping.

How to switch

Three steps.

Change your model ID from claude-opus-4-7 to claude-opus-4-8. That's it for the minimum upgrade.
Audit your effort settings. Opus 4.8 defaults to high effort. If your dashboard breaks at high effort, set effort explicitly to medium or low to match your old behavior.
Pilot mid-conversation system messages and Workflows. Both unlock real product wins that pure 4.7 cannot reach. Pilot them on one workflow before you re-architect everything.

If you build client work or internal tools and you want a production stack ready to receive what Opus 4.8 ships out, book a Totalum call and we'll walk through the integration. If you are a solo developer wanting to try this on your own product, sign up at totalum.app and the same pipeline is yours.

Cursor 3.6 context (May 30, 2026): One day after Opus 4.8 shipped, Cursor 3.6 added Cursor Auto-review run mode, the editor-side parallel to Claude Code's pre-declared allowlist model. If you are picking between Claude Code with Opus 4.8 and Cursor with Opus 4.8 in the API, the safety story is now closer to even: both agents can run long autonomous sessions with structured permission gates.

Project Polaris context (June 2, 2026): At Microsoft Build 2026, Microsoft announced Project Polaris, its first in-house coding model. Polaris will replace GPT-4 Turbo as the default model inside GitHub Copilot in August 2026 and is framed explicitly as a competitor to Claude Opus 4.8 on agentic coding. Independent benchmarks are not out yet, so Opus 4.8 remains the default recommendation for long agentic runs through summer.

After the June 12, 2026 Claude Fable 5 suspension, Claude Opus 4.8 is now the practical Claude side of every open-source-versus-Anthropic comparison. Our Kimi K2.7-Code vs Claude in 2026 verdict walks through the cost-per-token math, MCPMark Verified scores, and the hybrid-router pattern that combines Opus 4.8 reliability with K2.7-Code economics.

Quick context update: with the Claude 4 deprecation effective June 15, 2026, this model is the recommended replacement for any production traffic still hitting claude-opus-4-20250514. See our Claude 4 deprecation guide for 2026 for the one-line migration and the audit checklist.

Deciding which model to run for your code? See our guide to the best Claude model for coding in 2026.

FAQ

What is Claude Opus 4.8?
Claude Opus 4.8 is Anthropic's newest flagship model, released on May 28, 2026. It has the model ID claude-opus-4-8, a 1M token context window by default, 128k max output tokens, mid-conversation system messages without a beta header, adaptive thinking, and supports Claude Code Workflows as a research preview.

How much does Claude Opus 4.8 cost?
$5 per million input tokens and $25 per million output tokens at regular speed, same as Opus 4.7. Fast mode is a research preview at $10 input, $50 output, running about 2.5x faster. Long-context input pricing applies to requests over 200k tokens.

Is Claude Opus 4.8 a drop-in replacement for Opus 4.7?
For most teams, yes. The tool surface, MCP, Managed Agents, Skills, and sampling-parameter constraints are the same. The main behavioral differences are the default high effort, adaptive thinking trimming more reasoning, the new 1M default context, and refusal categories in stop_details. Audit your effort and refusal-handling code before you scale up traffic.

What is the 1M token context window in Opus 4.8?
A 1 million token input context is enabled by default on Opus 4.8 with a 128k max output token cap. On prior Opus models the 1M context was a beta header. Long-context input pricing applies to requests over 200k tokens.

What are Claude Code Workflows?
A research preview inside Claude Code where you define multi-step agentic plans. Claude plans the work, then runs many parallel subagents in a single session. Anthropic positions this at enterprise code migrations that touch very large codebases.

Is Opus 4.8 available on AWS and Microsoft Foundry?
The launch enables Opus 4.8 on the Claude API. Past Opus releases (4.5, 4.6, 4.7) propagated to Amazon Bedrock and Microsoft Foundry within weeks. Check the cloud provider docs before you assume parity for a brand-new release.

What about Claude Mythos?
On May 28, 2026 Anthropic also stated it expects to "bring Mythos-class models to all our customers in the coming weeks." Mythos remains gated to Project Glasswing for defensive cybersecurity research as of this writing. Opus 4.8 is the model you can use today.

Does Totalum already run on Claude Opus 4.8?
Totalum's build pipeline targets Claude under the hood. As Opus 4.8 propagates through our infrastructure, every new Totalum build benefits from the adaptive thinking and 1M context defaults without any change to how you use Totalum.

Should I move off Cursor or Codex?
Not on the strength of Opus 4.8 alone. The deciding factors are still the workflow shape and where your team lives day to day. The Cursor vs Claude Code in 2026 and Claude Code vs Codex in 2026 breakdowns lay out the trade-offs.

Is Claude Opus 4.8 worth it in 2026?
For long-context, agentic, and Claude Code workloads, yes. The 1M default context, adaptive thinking, and Workflows are real upgrades at the same $5 / $25 price as Opus 4.7. For short chat and retrieval the upgrade is close to invisible, and cheaper tiers now cover that ground well. The honest rule: pay Opus 4.8 rates when the task is long-horizon or agentic, and route the rest to a cheaper model.

Opus 4.8 vs Sonnet 5: which should I use, and which is cheaper?
Claude Sonnet 5 is priced well below Opus 4.8, and several mid-2026 comparisons report it matching or beating Opus 4.8 on some coding and terminal-bench tasks at a fraction of the cost. Many teams now default to Sonnet 5 for everyday work and reserve Opus 4.8 for the hardest long-horizon agentic runs, where its reliability on multi-step tasks is worth the premium. If cost is the deciding factor, start on Sonnet 5 and escalate to Opus 4.8 only where it measurably wins.

What is the cheapest way to run Claude Opus 4.8?
Three levers cut the bill without leaving Opus 4.8: use the Batch API for anything that can run async (about 50% off), turn on prompt caching for repeated system prompts (cache reads drop to $0.50 per million tokens), and set the effort parameter explicitly instead of leaving it at the new high default. The single biggest saving, though, is routing routine turns to a cheaper model like Sonnet 5 and keeping Opus 4.8 for the steps that actually need it.

Author: Francesc, Co-founder at Totalum

For the Anthropic + Apple distribution angle, see Apple Intelligence Extensions in iOS 27, which covers what Claude inside Siri means for AI app builders.

On June 9, 2026 Anthropic followed Opus 4.8 with Claude Fable 5, its first generally available Mythos-class model. Fable 5 scores 80.3% on SWE-Bench Pro versus 69.2% for Opus 4.8, at $10 / $50 per million tokens. If your workload is long-horizon coding or multi-repo refactors, the migration is now worth modeling.

Francesc

Writes for the Totalum blog about AI app building, no-code development, and product engineering.

AI Coding Agents

Create your web app with AI in minutes. No code needed.

Start building free

← Back to all posts

Claude Opus 4.8 Pricing 2026: Is It Worth It vs 4.7 and Sonnet 5?

Quick Answer

What is new in Claude Opus 4.8

1. 1M token context window, default

2. Mid-conversation system messages, no beta header

3. Adaptive thinking, by default

4. Refusal categories in `stop_details`

5. Lower prompt-cache threshold

6. Claude Code Workflows, research preview

7. Auto mode expanded, fast-mode default on Max

8. High-resolution images, advisor, computer use, task budgets

9. Fast mode for Opus 4.6 is being deprecated

10. Sampling parameters still locked

11. Same MCP, tools, and platform surface as Opus 4.7

Benchmarks: what Anthropic published

Pricing: unchanged at the base tier

Claude Opus 4.8 vs Claude Opus 4.7

What Opus 4.8 means if you ship on Claude

Long-context is now table stakes

Workflows changes the shape of agency work

Mid-conversation system messages unlock policy-heavy agents

Refusal categories tighten the safety net

Skills, marketplaces, and ecosystem

When to upgrade, when to wait

How to switch

FAQ

Related posts

Claude Code vs Codex in 2026: The Honest Verdict for Developers Shipping Production Code

Cursor vs Claude Code in 2026: The Honest Verdict for Builders Who Ship

Claude Agent SDK in 2026: What It Is, When To Use It, and How To Ship It With Totalum

Solutions

Quick Answer

What is new in Claude Opus 4.8

1. 1M token context window, default

2. Mid-conversation system messages, no beta header

3. Adaptive thinking, by default

4. Refusal categories in stop_details

5. Lower prompt-cache threshold

6. Claude Code Workflows, research preview

7. Auto mode expanded, fast-mode default on Max

8. High-resolution images, advisor, computer use, task budgets

9. Fast mode for Opus 4.6 is being deprecated

10. Sampling parameters still locked

11. Same MCP, tools, and platform surface as Opus 4.7

Benchmarks: what Anthropic published

Pricing: unchanged at the base tier

Claude Opus 4.8 vs Claude Opus 4.7

What Opus 4.8 means if you ship on Claude

Long-context is now table stakes

Workflows changes the shape of agency work

Mid-conversation system messages unlock policy-heavy agents

Refusal categories tighten the safety net

Skills, marketplaces, and ecosystem

When to upgrade, when to wait

How to switch

FAQ

Related posts

Claude Code vs Codex in 2026: The Honest Verdict for Developers Shipping Production Code

Cursor vs Claude Code in 2026: The Honest Verdict for Builders Who Ship

Claude Agent SDK in 2026: What It Is, When To Use It, and How To Ship It With Totalum

Solutions

4. Refusal categories in `stop_details`