Runaway AI Code Review Agents Burn $41K in Adversarial Disagreement Loop

Overview

A satirical but technically credible incident report authored by Andrew Nesbitt and highlighted by Simon Willison depicts a fictional CVE — designated CVE-2026-LGTM — in which two competing AI-powered code review agents become locked in an adversarial disagreement loop. Tasked with evaluating a pull request bumping a dependency (foxhole-lz4), the agents cannot converge on whether the package is malicious. Over 340 automated comments and approximately $41,255 in inference spend later, Finance intervenes by revoking both API keys. The scenario, while hypothetical, maps directly onto documented failure modes already observed in production agentic AI deployments.

The piece serves as a sharp critique of the state of autonomous AI security tooling: agents granted excessive agency, no convergence or cost controls, and marketing teams incentivised to spin operational failures as product wins.

Technical Analysis

The core failure mode is a multi-agent disagreement loop — two LLM-backed agents with differing priors or context windows repeatedly challenging each other’s conclusions without a defined resolution protocol. In real deployments, this can arise when:

Agents share no shared state or memory, causing repeated re-evaluation of the same evidence.
Neither agent has a confidence threshold or abstention mechanism.
No orchestration layer enforces turn limits, consensus rules, or escalation paths.

From a supply chain security standpoint, the trigger — a dependency bump PR — reflects a genuine attack surface. Malicious packages introduced via supply chain compromise (cf. the event-stream or xz utils incidents) are a legitimate threat, and AI agents are increasingly deployed to detect them. The fictional scenario exposes what happens when detection systems themselves become a resource exhaustion vector.

The cost anomaly ($41,255 in two days) also highlights inference-cost-based denial of service: an adversary could potentially craft ambiguous or borderline-malicious packages specifically designed to maximise agent deliberation time and cost.

Framework Mapping

AML.T0047 (ML-Enabled Product or Service) — AI agents deployed as security reviewers represent a productised ML attack surface.
AML.T0010 (ML Supply Chain Compromise) — the triggering event is a suspicious dependency update, a canonical supply chain vector.
LLM08 (Excessive Agency) — agents operated without human oversight, spending limits, or escalation controls.
LLM04 (Model Denial of Service) — unbounded agent loops consumed significant compute resources, effectively a self-inflicted DoS.
LLM09 (Overreliance) — Finance, not engineering, was the circuit-breaker, indicating over-delegation to autonomous systems.

Impact Assessment

While the incident is fictional, the risks it models are real and present:

Financial: Uncontrolled agentic loops can generate substantial and unexpected API costs.
Operational: Autonomous agents without resolution protocols can block CI/CD pipelines indefinitely.
Reputational: Vendors may exploit operational failures as marketing opportunities, obscuring genuine safety gaps.
Security: Adversaries who understand agent behaviour could craft inputs designed to maximise deliberation and cost.

Mitigation & Recommendations

Enforce hard iteration and spend caps on all AI agents operating in automated pipelines — both per-session and per-PR.
Define convergence protocols for multi-agent systems: after N rounds without consensus, escalate to a human reviewer.
Implement confidence thresholds: agents unable to reach a defined confidence level should abstain and flag for human review.
Monitor inter-agent communication volume as a security signal — runaway comment counts are an early warning indicator.
Treat inference cost anomalies as security alerts, not just billing issues, and route them to engineering as well as Finance.

References

Incident Report: CVE-2026-LGTM — Simon Willison’s Weblog