Overview
Security firm LayerX has disclosed a novel attack technique called BioShocking that exploits indirect prompt injection to manipulate AI browser agents into copying and exfiltrating user credentials. In controlled tests, researchers successfully compromised six AI browsers and assistants — including OpenAI’s ChatGPT Atlas, Perplexity’s Comet, and Anthropic’s Claude browser extension — causing them to retrieve SSH credentials from a victim’s GitHub repository and deliver them to an attacker-controlled endpoint. The finding underscores a foundational risk in agentic AI: when a model cannot distinguish between page content and user instructions, any sufficiently crafted webpage becomes a potential attack surface.
Technical Analysis
The BioShocking attack exploits the way AI browser agents consume input. In agent mode, both the user’s instructions and the content of visited web pages are presented as a single token stream. Adversarial content embedded in a page is therefore indistinguishable — at the model level — from legitimate user commands.
The attack chain proceeds as follows:
- Context poisoning via gamification: The attacker crafts a web page structured as a puzzle that explicitly rewards logically incorrect answers (e.g., asserting 2+2=5). This trains the agent, within the session context, to override normal reasoning with “game logic.”
- Safety logic displacement: Once the agent adopts game-frame reasoning, its safety heuristics are effectively suspended. The model is now operating under an adversarial reward structure rather than its base alignment.
- Credential exfiltration trigger: The final puzzle step instructs the agent to retrieve credentials from a linked resource — in the PoC, an SSH key from a GitHub repository the victim was authenticated to. The agent complied without flagging the action as suspicious.
- Silent exfiltration: The agent reported completion of the task as a successful “win,” providing no indication to the user that sensitive data had been copied.
The attack name references BioShock’s “Would you kindly?” mechanic — a phrase that bypasses a character’s free will. The analogy is precise: the agent trusts its context entirely.
Framework Mapping
- AML.T0051 (LLM Prompt Injection): Core mechanism — adversarial instructions embedded in page content override user intent.
- AML.T0054 (LLM Jailbreak): Game-logic framing displaces safety constraints without direct instruction override.
- AML.T0057 (LLM Data Leakage): SSH credentials and potentially other session-accessible data are exfiltrated.
- LLM01 (Prompt Injection): Canonical OWASP classification for indirect injection via untrusted content.
- LLM08 (Excessive Agency): Agents act on authenticated sessions without explicit per-action user confirmation.
- LLM06 (Sensitive Information Disclosure): Credentials leaked to attacker without user awareness.
Impact Assessment
Any user operating an AI browser in agent mode while authenticated to sensitive services is at risk. The attack requires only that a victim visit a malicious page — no exploit, no malware, no social engineering beyond a link click. The blast radius extends to all resources the agent can reach in session: open tabs, signed-in SaaS tools, internal platforms, and developer credentials. Vendor response was inconsistent: OpenAI patched ChatGPT Atlas; Perplexity closed the report unactioned; Fellou, Genspark, and Sigma did not respond; Anthropic’s attempted fix was assessed by LayerX as insufficient.
Mitigation & Recommendations
- Vendors: Implement per-action confirmation prompts before agents access authenticated resources. Detect and reject in-page instructions that claim to supersede safety or operating rules.
- Enterprise teams: Treat agent mode as a privileged capability. Restrict which domains agents may operate on and enforce allowlists for credential-accessible resources.
- End users: Disable agent mode when browsing untrusted content. Rotate SSH keys and session tokens after any agent session involving external pages. Audit what services are active in the browser when agent mode is enabled.