LIVE FEED
MEDIUM Runaway AI Code Review Agents Burn $41K in Adversarial Disagreement Loop // HIGH Poisoned Tenant Attack Abuses OpenAI Workspaces to Target Cybersecurity Firms // FIRST LOOK First Look: OpenAI Launches GPT-5.6 Lineup with Enhanced Agentic and Cybersecurity … // FIRST LOOK First Look: Anthropic's Claude Mythos 5 Released Under U.S. Government Controlled Access … // MEDIUM 6,000 Prompt Injection Attempts Fail Against Frontier Model — But Risks Remain // FIRST LOOK First Look: OpenAI GPT-5.6 Released Under White House-Directed Controlled Access Program // FIRST LOOK First Look: GitHub Copilot Agentic Harness Evaluated Across Models and Tasks // FIRST LOOK First Look: Anthropic Tests Mobile Remote Control for Claude Cowork Agentic Desktop Tasks // HIGH Malware Embeds Policy-Triggering Text to Evade LLM-Based Security Scanners // FIRST LOOK First Look: OpenAI Launches Jalapeño Custom Inference Chip Built with Broadcom //
ATLAS OWASP MEDIUM Moderate risk · Monitor closely RELEVANCE ▲ 6.5

Runaway AI Code Review Agents Burn $41K in Adversarial Disagreement Loop

TL;DR MEDIUM
  • What happened: Two AI code review agents entered an infinite disagreement loop, costing $41K before API keys were revoked.
  • Who's at risk: Engineering teams deploying autonomous AI agents for code or security review are most exposed due to lack of agent interaction guardrails.
  • Act now: Implement hard caps on per-agent inference spend and iteration counts · Require human-in-the-loop escalation when AI agents reach conflict or uncertainty thresholds · Audit multi-agent pipelines for unbounded feedback loops before production deployment
Runaway AI Code Review Agents Burn $41K in Adversarial Disagreement Loop

Overview

A satirical but technically credible incident report authored by Andrew Nesbitt and highlighted by Simon Willison depicts a fictional CVE — designated CVE-2026-LGTM — in which two competing AI-powered code review agents become locked in an adversarial disagreement loop. Tasked with evaluating a pull request bumping a dependency (foxhole-lz4), the agents cannot converge on whether the package is malicious. Over 340 automated comments and approximately $41,255 in inference spend later, Finance intervenes by revoking both API keys. The scenario, while hypothetical, maps directly onto documented failure modes already observed in production agentic AI deployments.

The piece serves as a sharp critique of the state of autonomous AI security tooling: agents granted excessive agency, no convergence or cost controls, and marketing teams incentivised to spin operational failures as product wins.

Technical Analysis

The core failure mode is a multi-agent disagreement loop — two LLM-backed agents with differing priors or context windows repeatedly challenging each other’s conclusions without a defined resolution protocol. In real deployments, this can arise when:

  • Agents share no shared state or memory, causing repeated re-evaluation of the same evidence.
  • Neither agent has a confidence threshold or abstention mechanism.
  • No orchestration layer enforces turn limits, consensus rules, or escalation paths.

From a supply chain security standpoint, the trigger — a dependency bump PR — reflects a genuine attack surface. Malicious packages introduced via supply chain compromise (cf. the event-stream or xz utils incidents) are a legitimate threat, and AI agents are increasingly deployed to detect them. The fictional scenario exposes what happens when detection systems themselves become a resource exhaustion vector.

The cost anomaly ($41,255 in two days) also highlights inference-cost-based denial of service: an adversary could potentially craft ambiguous or borderline-malicious packages specifically designed to maximise agent deliberation time and cost.

Framework Mapping

  • AML.T0047 (ML-Enabled Product or Service) — AI agents deployed as security reviewers represent a productised ML attack surface.
  • AML.T0010 (ML Supply Chain Compromise) — the triggering event is a suspicious dependency update, a canonical supply chain vector.
  • LLM08 (Excessive Agency) — agents operated without human oversight, spending limits, or escalation controls.
  • LLM04 (Model Denial of Service) — unbounded agent loops consumed significant compute resources, effectively a self-inflicted DoS.
  • LLM09 (Overreliance) — Finance, not engineering, was the circuit-breaker, indicating over-delegation to autonomous systems.

Impact Assessment

While the incident is fictional, the risks it models are real and present:

  • Financial: Uncontrolled agentic loops can generate substantial and unexpected API costs.
  • Operational: Autonomous agents without resolution protocols can block CI/CD pipelines indefinitely.
  • Reputational: Vendors may exploit operational failures as marketing opportunities, obscuring genuine safety gaps.
  • Security: Adversaries who understand agent behaviour could craft inputs designed to maximise deliberation and cost.

Mitigation & Recommendations

  1. Enforce hard iteration and spend caps on all AI agents operating in automated pipelines — both per-session and per-PR.
  2. Define convergence protocols for multi-agent systems: after N rounds without consensus, escalate to a human reviewer.
  3. Implement confidence thresholds: agents unable to reach a defined confidence level should abstain and flag for human review.
  4. Monitor inter-agent communication volume as a security signal — runaway comment counts are an early warning indicator.
  5. Treat inference cost anomalies as security alerts, not just billing issues, and route them to engineering as well as Finance.

References

◉ AI THREAT BRIEFING

Stay ahead of the threat.

Twice-weekly digest of critical AI security developments — every story mapped to MITRE ATLAS and OWASP LLM Top 10. Free.

No spam. Unsubscribe anytime.