ChatGPT Prompt Injection Enables Data Exfiltration

Overview

OpenAI has officially launched Lockdown Mode for ChatGPT, rolling it out to Free, Go, Plus, Pro, and self-serve Business account holders. The feature was first previewed in February 2026 and targets a specific, well-understood attack chain: the data exfiltration stage of a prompt injection attack. By restricting outbound network requests at the infrastructure level, Lockdown Mode eliminates the channel an attacker would use to receive stolen data — without relying on the AI model itself to detect or block the threat.

Technical Analysis

The underlying threat model Lockdown Mode addresses is what security researcher Simon Willison calls the Lethal Trifecta: the simultaneous presence of (1) LLM access to private user data, (2) LLM exposure to untrusted content, and (3) an outbound channel to exfiltrate data to an attacker. When all three conditions are met, a malicious prompt embedded in an uploaded file or cached web page can instruct the LLM to silently transmit sensitive information to an attacker-controlled endpoint.

Lockdown Mode severs the third leg — the exfiltration vector — using deterministic, non-AI-evaluated controls. This is significant: purely AI-based mitigations can themselves be subverted by sufficiently crafted adversarial prompts. Network-layer restrictions cannot be bypassed by manipulating model behaviour.

However, OpenAI explicitly warns that Lockdown Mode does not prevent prompt injections from influencing model behaviour or response accuracy. A malicious instruction in an uploaded PDF or cached page can still manipulate what the model says — it simply cannot use the model as a conduit to phone home with stolen data.

The implicit admission is notable: OpenAI’s own documentation confirms that default ChatGPT configurations do not robustly prevent determined data exfiltration via prompt injection.

Framework Mapping

AML.T0051 (LLM Prompt Injection): The attack vector Lockdown Mode is designed to mitigate — malicious instructions injected via untrusted content sources.
AML.T0057 (LLM Data Leakage): The exfiltration outcome being blocked — sensitive user data transmitted to attacker infrastructure.
LLM01 (Prompt Injection): Core OWASP category; injected instructions in files or web content drive the attack chain.
LLM06 (Sensitive Information Disclosure): The data exfiltration goal of the attack.

Impact Assessment

The feature is targeted at users with an elevated risk profile: journalists, executives, security researchers, legal professionals, and anyone routinely processing confidential documents within ChatGPT. For general consumer use, OpenAI CISO Dane Stuckey notes the tradeoffs in functionality may not be worthwhile. For high-value targets, the tradeoff is clearly justified.

The broader implication for enterprise and security teams is that default LLM deployments should be assumed to carry residual exfiltration risk unless explicit network-layer controls are in place.

Mitigation & Recommendations

Enable Lockdown Mode if you or your users process sensitive, confidential, or regulated data within ChatGPT.
Treat all uploaded files and web-fetched content as untrusted — prompt injection surfaces persist even with Lockdown Mode active.
Architect LLM pipelines with the Lethal Trifecta in mind: where possible, avoid combining private data access with untrusted content ingestion in a single agent context.
Prefer deterministic controls (network egress restrictions, sandboxing) over AI-evaluated guardrails for security-critical mitigations.
Review agentic and plugin-enabled ChatGPT use cases for residual exfiltration risk under default settings.