AI Coding Agents

Anthropic Managed Agents in 2026: production playbook

FrancescJune 11, 202617 min read

Anthropic Managed Agents 2026 capability map: a 2x2 grid of scheduled vs on-demand and managed vs self-hosted, with the managed quadrant highlighted

Anthropic Managed Agents are scheduled, sandboxed Claude agents that Anthropic runs on your behalf, with vault-stored secrets, private MCP access, a grading layer that scores each run, and a dreaming pass that curates the agent's memory between sessions. The June 2026 release (announced at Code with Claude San Francisco on May 19 and extended at the Tokyo event on June 11) turns the public-beta agent harness into a stack you can put behind a real product. This is the 2026 production playbook for software agencies, SaaS founders, and AI agent builders who want to ship Anthropic Managed Agents to real customers, including the parts the keynote skipped.

Quick Answer

Anthropic Managed Agents are agents that Anthropic orchestrates, schedules, and recovers, while tool execution runs in a sandbox you control and reaches your services through outbound-only MCP tunnels.
The June 2026 Tokyo Extended release adds three capabilities to the public beta: dreaming (a scheduled memory-curation pass), performance outcomes (rubric-based grading per run), and multi-agent orchestration (one managed agent composes others).
Use Managed Agents for recurring, long-horizon work where you want Anthropic to own the schedule, recovery, context, and memory. Use self-hosted Claude Code with Skills, Hooks, and Subagents when you want the loop running on your developer's machine.
Software agencies should treat Managed Agents as a billable retainer surface: a managed agent that runs nightly per client, posts back through their MCP server, and gets graded against an SLA-style rubric.
Totalum gives you the substrate underneath: a Next.js app with auth, payments, database, and a hosted MCP server that your managed agent can reach through an MCP tunnel.

What Anthropic Managed Agents are in 2026

A practical caveat as of June 13, 2026: pick your managed agent's model carefully. Anthropic suspended Claude Fable 5 and Mythos 5 globally on June 12 following a US government export control directive. Until Fable 5 access is restored, default your Managed Agents to Claude Opus 4.8 for long-horizon paths and Sonnet 4.6 for cost-sensitive routine work. The full fallback playbook is in the suspension guide.

A managed agent in 2026 is a Claude session that Anthropic runs as a managed workload. You define the agent's instructions, attach it to a schedule (cron expression or event trigger), point it at a sandbox where its tools execute, and give it access to vault-stored environment variables and a list of MCP servers it can reach. Anthropic owns the orchestration: launching the session at the right time, handling retries, persisting memory between runs, recovering from a cut connection, and applying the model upgrade path automatically when Opus or Haiku ship a new version.

The model you cannot replicate by writing your own cron job around the Claude API is the integration. Managed Agents bundle the scheduler, the memory store, the dreaming pass, the outcome grader, the sandbox provisioner, the MCP tunnel control plane, and the model-routing logic into one product. The agent shows up in the Claude console as a first-class object, with its own session history, its own memory file, its own rubric scores per run, and its own tunnel and sandbox bindings.

The keyword for the next twelve months is "managed". You stop being responsible for the loop, the scheduler, and the recovery story. You become responsible for the instructions, the sandbox boundary, the tunnel allowlist, the rubric, and the business outcomes.

What changed at Tokyo Extended, June 2026

The May 19 San Francisco announcement shipped Managed Agents in public beta with scheduled runs, vault-stored env vars, browser-capable integrations, self-hosted sandboxes (public beta), and MCP tunnels (research preview). The Tokyo event on June 10 and the Tokyo Extended hands-on day on June 11 add three capabilities that change how you design the agent.

First, dreaming. A scheduled background process now reviews each agent's session history and memory store between runs, extracts repeated patterns, and curates which memories survive into the next session. The agent gets better at its job over time without you re-writing the prompt every week.

Second, performance outcomes. You attach a rubric to a managed agent (a list of grade criteria with weights). Each run is scored in a separate Claude context against that rubric. Failed runs trigger a revision pass, where the original agent receives the grader's feedback and re-attempts the work. Internal Anthropic benchmarks reported double-digit gains on hardest-task success rates.

Third, multi-agent orchestration. A managed agent can now compose other managed agents inside its session: spawn them, wait on their outcomes, aggregate their summaries. The orchestrator owns the plan and the rollup; the worker agents own the focused execution. Tokyo Extended demonstrated this with an "agent battle" track where developers compose teams of managed agents against scored tasks.

The combined effect is that a single managed agent is no longer one Claude session per cron tick. It is a small organization with a brain (the orchestrator), specialists (the composed workers), a coach (the outcome grader), and a sleep cycle (the dreaming pass).

The Managed Agents architecture: who runs what

The split between Anthropic-owned and customer-owned components is the most useful thing to internalize before you ship. The June 2026 architecture looks like this.

Component	Owned by	Notes
Schedule and trigger	Anthropic	Cron expressions, event triggers, retries
Session orchestration	Anthropic	Model selection, retries, recovery, model upgrades
Memory store	Anthropic	Curated by dreaming, exposed to the agent at session start
Outcome grader	Anthropic	Separate context, runs your rubric per run
Sandbox where tools execute	Customer (or managed provider)	Self-hosted, or Cloudflare, Daytona, Modal, Vercel
Filesystem and network egress	Customer	Lives inside the sandbox boundary
Internal services the agent reaches	Customer	Reached via MCP tunnels (outbound only)
Secrets (API keys, tokens)	Customer	Vault-stored, injected into the sandbox at run start
Audit log of every tool call	Both	Anthropic stores the session; you can mirror to your SIEM

The reason this split matters: code never leaves your perimeter. The agent's filesystem writes happen on infrastructure you control. The internal services the agent calls (your database, your CMS, your blog API, your Stripe billing) are reached over a tunnel you opened outbound from your network. Anthropic sees the conversation and the tool-call envelope, not the contents of your filesystem or the response bodies from your private services. Enterprises that previously had to choose between full self-hosting and ceding tool execution to a vendor now have a third option.

When Managed Agents beat self-hosted Claude Code

Skills, Hooks, and Subagents are the self-hosted stack: you run Claude Code on a developer's machine or your own CI, the loop is yours, and you wire Skills to teach procedures, Hooks to enforce rules, Subagents to isolate work. Managed Agents move the loop to Anthropic and add scheduling, memory, dreaming, outcomes, and sandboxes. The decision is rarely either-or; it is which surface owns which work.

Job	Pick Managed Agents	Pick self-hosted Claude Code
Nightly customer reports	Yes (schedule + memory + outcome grade per client)	Possible but you own cron and retries
Interactive coding session	No	Yes (developer-driven loop is the whole point)
Per-PR code review	Possible (Claude SDK + webhook + managed agent)	Usually self-hosted with a Subagent + SubagentStop Hook
Recurring SEO audit on competitors	Yes (cron + sandbox + MCP tunnel to your CMS)	Possible but you re-build half the platform
Long-horizon research with memory	Yes (dreaming curates patterns across weeks)	Hard, you have to manage memory yourself
Multi-agent fan-out within one task	Tokyo Extended orchestration (managed)	Subagents (self-hosted)
Air-gapped compliance environment	Yes via self-hosted sandbox + MCP tunnel	Yes with full self-host of Claude Code
One-off ad-hoc work	No	Yes

The pattern that ships fastest in production: self-hosted Claude Code with Skills + Hooks + Subagents for everything a developer triggers (see our Claude Code subagents production playbook and Claude Code hooks reference), and a Managed Agent fleet for everything that runs on a schedule or per-customer cadence. The Managed Agent calls your services through MCP tunnels. The self-hosted agents call the same services through the same MCP servers, just without the tunnel layer.

Production setup: a managed agent that ships a recurring task

Here is a real shape we run for a recurring SEO audit on a portfolio of customer sites. The audit runs nightly, reads each site's recent posts and search-console data, scores their structural SEO, and posts a remediation list back into a project tracker. It is the kind of work no one wants to babysit and that benefits from memory of what got fixed last week.

The managed agent definition has four pieces.

Instructions: a one-screen system prompt that lays out the audit checklist, the grading rubric, the customer naming convention, and the tools the agent is allowed to call.

Schedule: a cron expression 0 3 * (every day at 03:00 UTC), with a max_runtime of 45 minutes and a retry policy of three with exponential backoff.

Sandbox binding: a self-hosted sandbox we run on Modal, with the agent's filesystem isolated per run. The sandbox has Node.js, Playwright headless Chrome, and curl. It has no outbound network access except through the MCP tunnels we opened.

MCP tunnel bindings: three tunnels, all outbound from our network. One reaches our hosted Totalum project that owns the audit-report writeup. One reaches Google Search Console through our service account. One reaches the customer's CMS through a per-customer API key pulled from the vault.

The agent runs once per night per customer. At session start, Anthropic injects the curated memory from last night's dream pass (which customers had recurring issues, which fixes already shipped, what the current redirect chain looks like on each site). The orchestrator agent splits the audit across composed worker agents (one per audit dimension: rendering, structured data, internal linking, image alt text, FAQ schema), waits for their outcomes, and aggregates a single report. The outcome grader scores the report against the rubric (severity coverage, deduplication against last week, concrete remediation steps). A failing grade triggers a revision pass where the orchestrator sees the grader's feedback and re-issues the failing worker. The final report is posted back into the project tracker through the Totalum MCP tunnel.

The reason this shape is hard to replicate with a hand-rolled cron job is the integration. You stop maintaining the scheduler, the secret injection, the sandbox provisioner, the memory store, the grader rig, and the multi-agent harness. Anthropic owns them. You own the instructions, the rubric, and the MCP server contracts.

Performance outcomes: grade every run with a rubric

The performance outcomes feature is the single biggest behavioral shift in the June 2026 release. A managed agent without a rubric runs forever. A managed agent with a rubric self-corrects within a single run.

A rubric is a YAML or JSON list of grading criteria, each with a weight, a description, and either a hard pass threshold or a soft scoring band. For our SEO audit we use criteria like "report includes at least one finding per audited dimension (weight 0.2, hard pass)", "remediation steps are concrete and reference specific URLs (weight 0.3, soft 0-10)", "report is deduplicated against last week's findings (weight 0.2, soft 0-10)", "tone matches the customer's preferred reporting voice (weight 0.1, soft 0-10)", "no out-of-scope writes outside the audit report (weight 0.2, hard pass)".

At run end, Anthropic spins up a separate Claude context, hands it the rubric and the artifact the agent produced, and asks it to score each criterion. If a hard-pass criterion fails, the agent is invoked again with the grader's feedback attached. If only soft criteria fail and the overall score is below the configured floor (typical: 7.5 of 10), the agent is invoked again. If the score is above the floor, the run is marked complete.

The dreaming pass uses the grader scores as input. Memories from high-scored runs survive at higher weight. Memories from runs that needed a revision get tagged with the failure mode so future sessions know to avoid it. Over weeks, the agent's grade trends up without you touching the prompt.

This is also the right place to plug in your own observability. The rubric scores stream out of Anthropic in real time; route them to your dashboard, alert when the seven-day average drops below the floor, and you have a service-level guarantee for the managed agent's work.

Sandboxes and MCP tunnels: locking down the execution surface

Self-hosted sandboxes are the public-beta answer to "I cannot let agent code touch my production environment". You provision a sandbox runtime in infrastructure you control (the Cloudflare, Daytona, Modal, or Vercel managed providers all ship templates, or you bring your own Kubernetes namespace). Anthropic provisions a fresh filesystem per session, injects the vault-stored secrets, and gates network egress to the tunnel set you allow-listed.

MCP tunnels are the research-preview answer to "I cannot open inbound firewall ports for an agent vendor". You install a small tunnel client inside your VPC. It opens an outbound connection to Anthropic's control plane. The agent's MCP client connects to your MCP servers through that tunnel, so traffic flows out from your network only and your services never sit on the public internet. No IP allowlisting required.

The pattern that works for a software agency or a SaaS product is one MCP server per logical surface. We run a Totalum MCP server for project data, a Calendly MCP server for booked calls, and a customer-specific MCP server per agency client. Each one is reached through its own tunnel. Each one is allow-listed in the managed agent's binding list. The agent cannot reach anything else, even if a prompt-injection attempt told it to. The execution surface is the sandbox plus the tunnel set, and nothing outside that surface is reachable.

For agencies, the second-order effect is compliance. You can stand up a SOC 2 or HIPAA-friendly deployment of a Managed Agent without negotiating data residency with Anthropic. The conversation goes through Anthropic. The data does not.

Cost control for Managed Agents in production

The cost model has three lines. Anthropic charges per token for the orchestration and per token for the grading context. The sandbox provider charges for compute time inside the sandbox. The MCP tunnel charges by data egress and by control-plane minutes.

For a 45-minute nightly SEO audit across 10 customer sites, the token costs dominate (roughly 75 to 85 percent of total). Two cost-control levers move the needle.

First, the model routing. Managed Agents support per-step routing between Opus, Sonnet, and Haiku. We route Opus to the orchestrator and the grader, Sonnet to the worker agents that crawl and analyze, and Haiku to the deduplication pass. The orchestrator and the grader need the strongest reasoning. The workers do not.

Second, the dreaming budget. By default, dreaming runs nightly and is cost-bounded. Cap the per-agent dreaming token budget at the level where memory quality plateaus. For our SEO audit we run dreaming once a week, not nightly, because the patterns we want curated are weekly not daily.

A practical floor for a single nightly managed agent with a moderate-sized rubric and 10 worker subagents lands in the low tens of dollars per month, exclusive of sandbox compute. A heavy multi-agent orchestration with daily dreaming can run into the hundreds per agent per month. Bill it to the customer as part of a retainer, and the unit economics for an agency become obvious.

Embedding Managed Agents for your software agency or SaaS

This is where it gets interesting for the agency and SaaS-embed audiences. Managed Agents are not just an internal automation surface. They are a billable product line.

The agency pattern: stand up one Managed Agent per client per recurring job. Nightly content audit. Weekly competitor monitoring. Daily lead-list enrichment. Each one bills the client as a managed service line item. Anthropic runs the schedule. You run the rubric, the report template, and the customer relationship. The MCP tunnel reaches the client's CMS, CRM, and analytics behind a per-client API key. Each client gets their own dashboard view of the rubric scores and the agent's recent work. You sell the operational outcome, not the AI.

The SaaS-embed pattern: every customer of your SaaS product gets their own Managed Agent provisioned at signup. The agent runs the recurring AI-shaped feature your product promises (an editorial assistant that drafts weekly content, a research analyst that summarizes a competitor set, a finance agent that reconciles invoices, an HR agent that drafts reviews). Your product owns the instructions, the rubric, and the MCP surface. Anthropic owns the schedule and the recovery. Your product becomes a managed-agent host without you running the agent infrastructure.

The substrate that makes both patterns ship in days, not months, is a Next.js application with auth, payments, database, file storage, deployment, and a hosted MCP server that the managed agent can reach. That is what Totalum builds. Spin up a project, hook it to your Stripe account, expose a Totalum MCP server, and bind it to your managed agent fleet. The managed agent reads and writes through your MCP. Your customers see a clean app. You see one substrate per product instead of a custom stack per agent.

For a deeper walk through the embed motion, see how to embed an AI app builder via API. For the relationship to the broader agent platform, see our AI agent platform pillar.

How Managed Agents compare to Cursor and Cline workflows

Cursor and Cline live on a developer's machine. The loop is interactive. The agent writes code, the developer reviews, the developer ships. Managed Agents live on Anthropic's infrastructure. The loop is scheduled. The agent does the work, the grader scores it, the artifact lands in your system. The two are not in competition; they are different surfaces for different jobs.

A working portfolio uses both. Cursor or Cline for the active coding work, with the Claude Agent SDK exposed for in-editor agents (see our Claude Agent SDK reference, plus the Claude Agent SDK credits June 15 playbook for the new credit pool economics). Managed Agents for everything that should run while everyone is asleep, with the MCP servers reachable from both. The agency that ships both surfaces to clients can bill the interactive work as project hours and the managed work as a retainer.

The Tokyo Extended announcement (recapped in our Code with Claude Tokyo 2026 summary) made the convergence explicit. The same Claude that powers Cursor's editor now sits inside a managed sandbox at 03:00 UTC, reading the same MCP server your editor agent reaches, scoring its own work against a rubric you wrote.

FAQ

Are Anthropic Managed Agents generally available?

The Managed Agents platform is in public beta as of June 2026. Self-hosted sandboxes are in public beta. MCP tunnels are in research preview. The dreaming, performance outcomes, and multi-agent orchestration features were added at Code with Claude Tokyo Extended on June 11, 2026.

Where does the agent's code actually run?

The orchestration and grading contexts run on Anthropic. Tool execution runs in a sandbox you control: self-hosted on your own infrastructure or on a managed provider (Cloudflare, Daytona, Modal, Vercel). Network egress from the sandbox is gated to the MCP tunnels you opened.

Do Managed Agents replace Claude Code Skills, Hooks, and Subagents?

No. They are complementary surfaces. Use Skills, Hooks, and Subagents for the developer-driven loop on your own machine or CI. Use Managed Agents for scheduled, long-horizon work that benefits from Anthropic-owned orchestration, memory, and grading. Most production setups in 2026 run both.

How are Managed Agents priced?

Token costs for orchestration and grading, plus sandbox compute, plus MCP tunnel data and control-plane minutes. Per-step model routing (Opus orchestrator, Sonnet workers, Haiku dedup) keeps unit costs low. A modest nightly agent runs in the low tens of dollars per month exclusive of sandbox compute.

Can I bring my own MCP servers?

Yes. The MCP tunnel control plane allow-lists your private MCP endpoints by binding ID. Each managed agent declares which MCP servers it can reach. Servers outside the binding list are unreachable from inside the sandbox.

What workflows are best for a software agency to charge for?

Recurring per-client jobs: content audits, competitor monitoring, lead enrichment, weekly report drafts, billing reconciliation, support-ticket triage. Sell the operational outcome on a retainer. Anthropic runs the schedule and recovery. You run the rubric, the report template, and the client relationship.

Ready to ship Managed Agents with Totalum?

If you run a software agency or a SaaS product and want to ship Anthropic Managed Agents to real customers in weeks, not quarters, talk to us. Totalum gives you the Next.js application with auth, payments, database, file storage, deployment, and a hosted MCP server that your managed agent can reach through an MCP tunnel. Book a 30-minute discovery call at calendly.com/totalum/30min and we will show you a live managed-agent setup against a Totalum project, including the rubric, the sandbox binding, and the tunnel. If you want to try it solo first, you can start a Totalum project free at totalum.app.

Francesc

Writes for the Totalum blog about AI app building, no-code development, and product engineering.