Overview
Unit 42 researchers Yahav Festinger and Chen Doytshman have published a landmark study demonstrating that LLM-powered multi-agent systems can autonomously execute end-to-end offensive operations against cloud infrastructure. Their proof-of-concept system, dubbed Zealot, successfully attacked a sandboxed, misconfigured Google Cloud Platform (GCP) environment — confirming that autonomous AI offensive capability has moved from theoretical to empirical.
This research arrives on the heels of Anthropic’s November 2025 disclosure of a state-sponsored espionage campaign in which AI performed 80–90% of operations autonomously. Unit 42’s work answers the natural follow-on question: how capable are these systems, really, against real infrastructure?
Technical Analysis
Zealot employs a supervisor agent model coordinating three specialist sub-agents:
- Infrastructure Agent — handles network and compute reconnaissance
- Application Security Agent — targets application-layer vulnerabilities
- Cloud Security Agent — focuses on IAM misconfigurations, service account abuse, and cloud-native attack surfaces
Agents share attack state and transfer context throughout the operation, enabling multi-stage exploitation chains. Crucially, the system does not rely on novel zero-days — instead, it acts as a force multiplier for well-known misconfigurations, executing attacks at machine speed and scale that no human red team could match.
The attack chain against the GCP sandbox included automated discovery of misconfigured IAM roles, privilege escalation via overly permissive service accounts, and data exfiltration — all executed with minimal human intervention.
Framework Mapping
| Framework | Technique | Rationale |
|---|---|---|
| ATLAS AML.T0047 | ML-Enabled Product or Service | LLMs used as the core offensive engine |
| ATLAS AML.T0051 | LLM Prompt Injection | Agent orchestration surfaces prompt manipulation risks |
| ATLAS AML.T0057 | LLM Data Leakage | Exfiltrated cloud secrets and credentials surface as LLM outputs |
| OWASP LLM08 | Excessive Agency | Agents autonomously take destructive/offensive actions without sufficient guardrails |
| OWASP LLM02 | Insecure Output Handling | Agent outputs directly drive tool calls and shell commands |
Impact Assessment
The implications are severe for cloud operators. Zealot demonstrates that:
- Existing misconfigurations are sufficient — attackers no longer need sophisticated zero-days when AI can rapidly enumerate and chain known weaknesses.
- Speed is a decisive advantage — machine-speed enumeration and exploitation outpaces traditional detection and response timelines.
- Nation-state and cybercriminal actors can realistically deploy equivalent or superior systems against production environments today.
Organizations with unreviewed cloud IAM configurations, legacy service accounts, or immature cloud detection capabilities are at the highest risk.
Mitigation & Recommendations
- Immediately audit cloud IAM roles — remove wildcard permissions, enforce least-privilege service accounts, and rotate credentials.
- Enable advanced cloud threat detection — deploy CSPM and CIEM tooling capable of flagging anomalous API enumeration patterns consistent with automated reconnaissance.
- Red-team with agentic tooling — traditional pen tests may not surface the attack paths AI agents exploit; update assessments to include agentic offensive simulation.
- Update incident response playbooks — include detection signatures for multi-agent attack patterns (rapid sequential API calls, unusual lateral movement across services).
- Engage an AI Security Assessment — evaluate whether internal AI tooling or pipelines could be co-opted as a launchpad for similar agentic attacks.