LIVE THREATS
CRITICAL Paloalto's Zealot successfully attacks misconfigured cloud environments // HIGH Bitwarden CLI Compromised in Ongoing Checkmarx Supply Chain Campaign // HIGH Bad Memories Still Haunt AI Agents // CRITICAL ChatGPT's code runtime silently exfiltrates user data via malicious prompt // HIGH Claude's Mythos rival: Chinese Cybersecurity Firm claims finding 1000 vulnerabilities // CRITICAL Vertex AI agents can be weaponized to steal GCP service credentials // CRITICAL Project Glasswing Proved AI Can Find the Bugs. Who's Going to Fix Them? // HIGH AI-powered defense for an AI-accelerated threat landscape // HIGH SentinelOne's AI-powered EDR autonomously claims blocking a Claude Zero Day Supply Chain … // CRITICAL Critical OpenClaw flaw lets low-privilege attackers silently seize full admin control //
ATLAS OWASP CRITICAL Active exploitation · Immediate action required RELEVANCE ▲ 9.2

Project Glasswing Proved AI Can Find the Bugs. Who's Going to Fix Them?

TL;DR CRITICAL
  • What happened: Anthropic's Mythos model autonomously found and chained critical exploits across major OSes with 72.4% success rate.
  • Who's at risk: Operators of major operating systems, browsers, and network appliances are most exposed, as AI-discovered vulnerabilities vastly outpace the ecosystem's patching capacity.
  • Act now: Prioritise AI-assisted patch triage pipelines to absorb high-volume vulnerability feeds from automated discovery tools · Deploy continuous autonomous validation rather than periodic penetration testing to match adversarial machine-speed operations · Audit any LLM-integrated tooling for excessive agency and lateral movement capability, especially MCP-style agentic deployments
Project Glasswing Proved AI Can Find the Bugs. Who's Going to Fix Them?

Overview

Anthropicʼs Project Glasswing — built on the Mythos Preview model — has demonstrated that AI-driven vulnerability discovery has crossed a critical threshold. The model identified exploitable bugs across every major operating system and browser, including a 27-year-old vulnerability in OpenBSD. More alarming than the discovery capability is what came after: fewer than 1% of the vulnerabilities Mythos found were patched before controlled disclosure. The finding reframes the industry’s core challenge: AI has solved the discovery problem. Nobody has solved the remediation problem.

Technical Analysis

Mythos exhibited several capabilities that distinguish it from prior AI-assisted security tooling:

  • Exploit chaining: Mythos autonomously chained four independent bugs into a unified exploit sequence that bypassed both browser renderer isolation and OS-level sandboxing — a technique previously requiring elite human researchers.
  • Privilege escalation: It identified and exploited Linux local privilege escalation via race conditions, a notoriously difficult class of vulnerability to catch in automated analysis.
  • ROP chain construction: The model built a 20-gadget Return-Oriented Programming chain targeting FreeBSD’s NFS server, distributing the chain across multiple network packets — demonstrating packet-level exploit awareness.
  • JS shell exploitation: Against the Firefox JavaScript shell, Mythos achieved a 72.4% autonomous exploit success rate. By comparison, Claude Opus 4.6 — Anthropicʼs previous frontier model — failed at autonomous exploit development almost entirely.

Separately, a threat actor was observed deploying a custom MCP (Model Context Protocol) server hosting an LLM as part of an attack chain against FortiGate appliances. The AI autonomously handled backdoor creation and internal infrastructure mapping — confirming that LLM-integrated autonomous attack pipelines are no longer theoretical.

Framework Mapping

MITRE ATLAS

  • AML.T0047 (ML-Enabled Product or Service): Mythos functions as an offensive ML product; adversaries gaining equivalent access would weaponise it directly.
  • AML.T0043 (Craft Adversarial Data): The model’s ROP chain and exploit sequencing represent adversarially crafted payloads synthesised by ML.
  • AML.T0040 (ML Model Inference API Access): Controlled release to select vendors mirrors the access-control risks inherent in powerful inference APIs.
  • AML.T0031 (Erode ML Model Integrity): The gap between discovery and patching creates a window where model outputs could be leveraged to erode system integrity at scale.

OWASP LLM Top 10

  • LLM08 (Excessive Agency): The MCP-hosted LLM attack chain exemplifies an agent with unconstrained action scope executing multi-stage attacks autonomously.
  • LLM09 (Overreliance): Defenders relying on AI-generated security assessments without matching remediation capacity creates systemic overreliance risk.
  • LLM05 (Supply Chain Vulnerabilities): Bugs discovered in widely-used OS and browser components represent supply chain exposure amplified by AI-speed discovery.

Impact Assessment

The impact is systemic rather than vendor-specific. Major OS vendors, browser maintainers, and network appliance operators face an environment where a single well-resourced adversary with access to a Mythos-equivalent model could generate an exploit queue faster than any enterprise security team can respond. The 4-day average defender cycle time versus near-instantaneous machine-speed attack generation creates an asymmetric risk posture that existing security operations models cannot bridge without structural change.

Mitigation & Recommendations

  1. Adopt continuous autonomous validation: Replace point-in-time penetration testing with persistent automated validation platforms that operate at the same cadence as adversarial tooling.
  2. Build AI-assisted patch triage: Security and engineering teams must deploy ML-assisted prioritisation to manage high-volume vulnerability queues produced by discovery-class models.
  3. Constrain agentic deployments: Any LLM-integrated tooling (including MCP-style agents) must enforce least-privilege action scopes, network segmentation, and human-in-the-loop gates for irreversible actions.
  4. Threat-model for AI-generated exploits: Red teams should incorporate AI exploit-chaining scenarios — not just individual CVEs — into adversary simulation exercises.
  5. Monitor for anomalous LLM API usage: Treat high-volume, structured inference calls to security-adjacent APIs as a potential indicator of automated vulnerability research by adversaries.

References