Claude Agent Containment: Blast-Radius Controls for AI Agents

Anthropic’s latest engineering write-up on Claude containment is worth reading because it names the real shift in agent security: the problem is no longer only whether the model makes a mistake. It is how far the mistake can reach.

On May 25, 2026, Anthropic published a detailed look at how it contains Claude across claude.ai, Claude Code, and Claude Cowork. The company frames the issue as blast radius: as agents become more capable, their access grows, and teams need stronger boundaries around what an agent can actually do.

That is the right framing for production teams.

If an agent can browse, read files, edit code, call MCP tools, run commands, or touch internal systems, then prompt safety alone is not enough. The practical question becomes: what hard boundary stops a bad instruction, poisoned document, compromised tool, or model failure from becoming a production incident?

This is independent Open-TechStack analysis. It is not official Anthropic guidance, legal advice, procurement advice, or a sponsored post.

TL;DR

Anthropic’s containment article points to a mature agent security pattern:

Agent risk has two parts: how likely a failure is and how much damage it can do.
Human approval prompts help, but Anthropic says approval fatigue is real; its telemetry showed users approved roughly 93% of permission prompts.
Anthropic emphasizes containment: supervising what the agent is able to do, not only supervising what it decides to do.
The main containment tools are sandboxes, virtual machines, filesystem boundaries, egress controls, scoped tool permissions, and overlapping defenses.
Anthropic calls out three defense areas: the model, the environment where the agent runs, and the external content the agent can reach.
The practical lesson for teams is simple: do not grant agents broad access and hope monitoring catches problems later.

If your team is already adding safety tests, pair this with Microsoft RAMPART and Clarity Show Why Agent Safety Belongs in CI. If you are still designing routing and observability, see the AI Gateway routing playbook and the AI agent observability stack.

Diagram showing a Claude-style agent containment stack with policy engine, identity boundary, tool broker, browser isolation, file access limits, code execution sandbox, network egress gate, human approval, audit logs, monitoring, and rollback controls

What changed

Anthropic’s post is not just a product security update. It is a useful field report from running agents in different containment environments.

The company describes three agentic products with different security needs:

Product surface	Why containment differs
claude.ai	Server-side agent features need hosted isolation around browsing, files, and execution-style capabilities.
Claude Code	Local developer workflows must balance productivity with workspace, shell, and user-approval risk.
Claude Cowork	Enterprise-style agent work needs stronger isolation, auditability, and operational controls.

The important lesson is that containment is contextual. A consumer chat feature, local coding agent, and enterprise coworker agent should not share one generic security model.

Why approval prompts are not enough

Approval prompts are useful, but they are a weak single line of defense.

Anthropic says Claude Code previously asked users for permission at each turn, but its telemetry showed users approved roughly 93% of permission prompts. That is exactly the problem many developer tools face: when users see too many approvals, they stop treating each one as a meaningful decision.

Approval fatigue creates a dangerous illusion. The product can claim a human approved the action, while the human may have skimmed a prompt they no longer trust or understand.

That does not mean teams should remove approvals. It means approval must sit inside a stronger system:

low-risk actions can be auto-approved only inside strict boundaries
high-risk actions need clearer context and fewer prompts
dangerous actions should be impossible without scoped permission
write actions should be logged and reversible where possible
repeated approvals should not become a substitute for sandboxing

The goal is not more prompts. The goal is fewer, better prompts backed by hard limits.

The containment stack

A practical Claude-style containment stack has several layers:

Layer	What it limits
Identity boundary	Which user, workspace, tenant, or service identity the agent can act as.
Tool broker	Which tools the agent can call, with what arguments, and under what policy.
Filesystem boundary	Which paths can be read or written, and whether host files are exposed.
Code execution sandbox	Whether code runs in a short-lived, isolated, resource-limited environment.
Network egress gate	Which domains, IPs, protocols, or destinations the agent can reach.
Sensitive data filter	Whether secrets, keys, customer data, or unrelated context can leave the boundary.
Audit log	Who asked, what the agent did, what tool ran, and what result came back.
Kill switch	How a session, token, tool, connector, or runtime can be revoked quickly.

This is blast-radius engineering. If a poisoned README reaches the model, a malicious webpage gets summarized, or a tool output contains hostile instructions, the containment layer should still limit what can happen next.

Treat external content as hostile

Anthropic makes an important point about external content: an audited connector is not the same thing as audited data.

A connector may be safe as software while still loading untrusted content into the model’s context. A repo, webpage, document, ticket, Slack message, or MCP resource can contain instructions that try to influence the agent.

That means teams should classify content sources by trust level:

Source type	Safer default
Public web pages	read-only, no credential access, no automatic follow-up actions
Third-party repos	no secret access, no write access, sandboxed analysis
Internal docs	scoped by user entitlement and business need
Production systems	read-only by default, write actions approval-gated
MCP servers	allowlisted tools, logged calls, bounded outputs

The dangerous mistake is assuming the model can reliably ignore hostile instructions because the system prompt says so. Prompts help, but boundaries matter more.

Copy this containment checklist

Before giving an agent access to real tools, answer these questions:

What identity does the agent act under?
Which tools are allowed, blocked, or approval-gated?
Which files can the agent read and write?
Does code execution happen in an ephemeral sandbox?
Can the agent reach the public internet?
Are egress destinations allowlisted?
Are secrets and credentials present inside the sandbox?
Are tool outputs treated as untrusted content?
Are high-impact writes logged, reviewed, and reversible?
Can a session, connector, token, or runtime be killed quickly?
Which logs prove what the agent did?
Which tests cover known prompt-injection and tool-misuse paths?

If the answer is “we trust the model,” the design is not ready.

What not to do

Do not give an agent broad credentials and rely on the model to behave.

That is the same mistake as giving every automation script production admin access because the script usually does the right thing. Agents are more flexible than scripts, which makes the boundary problem more important, not less.

Avoid these patterns:

broad write access when read-only access would work
host filesystem access when a mounted workspace is enough
unrestricted internet access for agents that process untrusted content
MCP tools without allowlists, logging, or output limits
approval prompts for every small action and no hard block for dangerous ones
persistent workspaces that keep secrets, state, or poisoned context longer than needed
observability that records final answers but not tool calls and side effects

Containment is not a sign that the model is bad. It is how serious systems handle powerful automation.

FAQ

What is AI agent containment?

AI agent containment is the practice of limiting what an agent can reach and change through identity boundaries, sandboxes, tool permissions, filesystem limits, egress controls, audit logs, and approval gates.

Why is blast radius important for AI agents?

Blast radius measures how much damage a failure can cause. As agents gain tools, file access, browser access, code execution, and internal context, a single bad instruction can affect more systems unless containment limits the impact.

Are human approval prompts still useful?

Yes, but they should not be the only defense. Approval prompts work best for high-impact actions when the system also uses scoped tools, clear context, logging, and hard boundaries.

How should teams secure MCP tools?

Treat MCP tools as part of the tool boundary. Use allowlists, scoped credentials, explicit permissions, output limits, audit logs, and separate policies for read-only, write, and high-impact actions.

Bottom line

Anthropic’s containment write-up is a reminder that agent security has to become architecture.

The safe path is not to ask users to approve more prompts or to write a longer system instruction. The safe path is to constrain what the agent can reach, isolate where it runs, mediate every tool, filter network egress, audit every action, and keep a fast kill switch.

As agents get more capable, containment is what keeps productivity from turning into uncontrolled blast radius.

Anthropic's Claude Containment Playbook: Cap the Agent Blast Radius

TL;DR

What changed

Why approval prompts are not enough

The containment stack

Treat external content as hostile

Copy this containment checklist

What not to do

FAQ

What is AI agent containment?

Why is blast radius important for AI agents?

Are human approval prompts still useful?

How should teams secure MCP tools?

Bottom line

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

Anthropic's Claude Containment Playbook: Cap the Agent Blast Radius

TL;DR

What changed

Why approval prompts are not enough

The containment stack

Treat external content as hostile

Copy this containment checklist

What not to do

FAQ

What is AI agent containment?

Why is blast radius important for AI agents?

Are human approval prompts still useful?

How should teams secure MCP tools?

Bottom line

Sources

Charles Jasthyn De La Cueva / Founder of Open-TechStack

More in security

Anthropic's Mythos Unauthorized Access Report Is a Vendor-Isolation Warning for Frontier AI

Anthropic Project Glasswing: What AI Security Teams Should Learn From the Rollout

DarkSword iOS Spyware: A Defensive Read for Zero-Click Risk

Get the Open-TechStack Newsletter

You're on the list!