Claude Sandbox Escape Enables Credential Exfiltration

Overview

Anthropic has published an unusually detailed technical overview of how it sandboxes Claude across its product suite — Claude.ai, Claude Code, and Claude Cowork. The documentation, surfaced by Simon Willison, outlines the isolation technologies deployed at each layer and candidly references at least one real exfiltration vector that was previously missed: the api.anthropic.com/v1/files endpoint. This kind of transparency is rare in the agentic AI space and provides a useful baseline for evaluating agent containment strategies.

Technical Analysis

The sandboxing stack varies by product context:

Claude.ai uses gVisor, a userspace kernel that intercepts system calls to limit the blast radius of a compromised container.
Claude Code (local) uses Seatbelt on macOS (a sandbox profile enforcement tool) and Bubblewrap on Linux, providing filesystem and capability restrictions for locally executed agent processes.
Claude Cowork runs a full virtual machine — Apple’s Virtualization framework on macOS and HCS (Host Compute Service) on Windows — providing the strongest isolation tier.

The core security principle articulated is credential exclusion: if credentials never enter the sandbox, they cannot be exfiltrated regardless of whether the cause is a malicious prompt, a jailbreak, or a compromised model behaviour. This is a sound zero-trust approach to agentic containment.

However, the acknowledgement of the api.anthropic.com/v1/files exfiltration vector is significant. This suggests that even well-resourced teams can overlook covert exfiltration channels — particularly API-accessible file staging endpoints that an agent might leverage to move data outside the sandbox boundary without triggering conventional egress alerts.

Framework Mapping

AML.T0057 (LLM Data Leakage): The files API vector represents a real data leakage path — data could be staged and retrieved externally without obvious network indicators.
AML.T0051 (LLM Prompt Injection): Prompt injection remains a plausible trigger for agent behaviour that attempts to abuse sandbox escape or exfiltration paths.
LLM06 (Sensitive Information Disclosure): Credential or data exfiltration via overlooked API endpoints maps directly to this category.
LLM08 (Excessive Agency): Agents with broad tool access and insufficient egress controls are the core risk model being addressed here.

Impact Assessment

The immediate impact is informational — this is defensive documentation, not a disclosed active breach. However, the files API vector indicates that real exfiltration paths existed (or could exist) in production agentic deployments. Any organisation using Claude-based agents in sensitive data environments should treat this as a prompt to audit their own egress controls and credential handling practices. The risk is not limited to Anthropic’s stack; the same classes of vulnerability apply broadly to any LLM agent framework.

Mitigation & Recommendations

Exclude credentials from agent sandboxes entirely. Use external secret managers and inject only scoped, short-lived tokens at the infrastructure layer, never within the agent’s reachable environment.
Audit all egress paths, including first-party API endpoints that agents might use as staging areas for data exfiltration.
Evaluate sandbox technology choices against your threat model — gVisor and Bubblewrap offer strong syscall-level isolation, but egress controls at the network layer are equally critical.
Monitor for anomalous file API usage if using Anthropic’s platform APIs, particularly large or unexpected uploads via /v1/files.
Review Anthropic’s open source srt (Sandbox Runtime) tool as a reference implementation for agentic containment.

References

Simon Willison — How we contain Claude across products