LIVE THREATS
MEDIUM LLMs Demonstrate Strong Capability for Covert Text Steganography // CRITICAL Typosquatted OpenAI Repo on Hugging Face Delivered Rust Infostealer to 244K Users // HIGH Fake OpenAI Repository on Hugging Face Delivers Rust-Based Infostealer // HIGH ClaudeBleed Flaw Lets Rogue Chrome Extensions Hijack AI Agent // HIGH Claude Mythos AI-Assisted Fuzzing Uncovers 423 Firefox Security Bugs in One Month // HIGH Fake Claude AI Site Used to Distribute Beagle Backdoor and PlugX Malware // HIGH Malicious Repos Trigger Silent Code Execution in Claude, Cursor, Gemini CLIs // HIGH Mitiga Labs: MCP Hijack Attack Steals Claude Code OAuth Tokens via Silent … // HIGH Pixel-Level Perturbations Enable Invisible Prompt Injection in Vision-Language Models // CRITICAL Prompt Injection Achieves Remote Code Execution in Semantic Kernel Agent Framework //
ATLAS OWASP MEDIUM Moderate risk · Monitor closely RELEVANCE ▲ 6.5

LLMs Demonstrate Strong Capability for Covert Text Steganography

TL;DR MEDIUM
  • What happened: LLMs can reliably encode and decode hidden messages inside normal-looking text.
  • Who's at risk: Organisations relying on LLM-based content moderation or DLP tools are most exposed, as steganographic output evades text-level inspection.
  • Act now: Audit LLM output pipelines for unexpected or anomalous linguistic patterns that may indicate steganographic encoding · Incorporate semantic and statistical analysis into content moderation — not just surface-level text inspection · Restrict LLM access in high-sensitivity environments where covert data exfiltration via generated text is a concern
LLMs Demonstrate Strong Capability for Covert Text Steganography

Overview

A research paper flagged by Bruce Schneier confirms that large language models are surprisingly effective at performing text-in-text steganography — the practice of hiding secret messages within ordinary-looking prose. Unlike traditional steganographic methods that manipulate image pixels or whitespace, LLM-based steganography operates at the linguistic layer, selecting synonyms, sentence structures, or phonological variants to encode binary payloads imperceptibly to human readers.

This capability has meaningful implications for AI security: it creates a mechanism for covert communication that can bypass conventional data loss prevention (DLP) tools, content moderation systems, and human reviewers alike.

Technical Analysis

The core technique exploits the probabilistic nature of LLM token generation. By manipulating sampling parameters (temperature, top-k, nucleus sampling), a sender can bias word choices to encode a bitstream. The recipient, with knowledge of the encoding scheme and model, can decode the hidden message by observing which token choices were made at each decision point.

Commenters on the Schneier post noted that even phonologically distorted text — e.g., “phashyon es cycklyq” — is decoded with ease by models as small as 4 billion parameters. This suggests the attack surface extends beyond frontier models to widely accessible open-source deployments.

The technique operates at what Schneier commenter Clive Robinson describes as a “layer of language” trade-off: higher-level encoding (longer token spans) produces more coherent cover text but may introduce contextual inconsistencies; lower-level encoding is more subtle but may degrade readability.

Framework Mapping

  • AML.T0015 – Evade ML Model: Steganographic outputs are crafted to evade detection by content classifiers and moderation pipelines.
  • AML.T0043 – Craft Adversarial Data: The encoded text constitutes adversarially constructed data designed to carry a covert payload.
  • AML.T0057 – LLM Data Leakage: In insider or supply chain threat scenarios, LLMs could be used to exfiltrate sensitive data by encoding it into benign-appearing generated content.
  • LLM02 – Insecure Output Handling: Downstream systems that consume LLM output without semantic scrutiny may inadvertently relay hidden messages.
  • LLM06 – Sensitive Information Disclosure: An LLM could be prompted to embed confidential information into public-facing outputs via steganographic encoding.

Impact Assessment

The primary risk is for organisations that deploy LLMs in content creation, summarisation, or customer-facing roles, where generated text may exit secure environments. A malicious insider or compromised model could encode sensitive data — credentials, PII, proprietary information — into output that passes standard inspection.

Secondarily, threat actors could use LLM steganography for command-and-control communications that evade network-level content inspection, embedding instructions in publicly posted text.

Mitigation & Recommendations

  1. Apply statistical analysis to LLM outputs: Entropy and stylometric analysis can flag text with abnormal token distributions, potentially indicating steganographic encoding.
  2. Restrict model sampling parameters in production: Locking temperature and sampling settings reduces the degrees of freedom available for encoding.
  3. Implement output watermarking: Cryptographic watermarking of LLM outputs (e.g., using tools like snowdrop noted in the comments) can help attribute and audit generated text.
  4. Red-team LLM deployments for covert channel abuse: Include steganography scenarios in adversarial testing of AI pipelines.
  5. Monitor for unusual linguistic patterns: Deploy secondary NLP classifiers trained to detect statistically improbable word choice sequences.

References

◉ AI THREAT BRIEFING

Stay ahead of the threat.

Twice-weekly digest of critical AI security developments — every story mapped to MITRE ATLAS and OWASP LLM Top 10. Free.

No spam. Unsubscribe anytime.