LLMs Demonstrate Strong Capability for Covert Text Steganography

Overview

A research paper flagged by Bruce Schneier confirms that large language models are surprisingly effective at performing text-in-text steganography — the practice of hiding secret messages within ordinary-looking prose. Unlike traditional steganographic methods that manipulate image pixels or whitespace, LLM-based steganography operates at the linguistic layer, selecting synonyms, sentence structures, or phonological variants to encode binary payloads imperceptibly to human readers.

This capability has meaningful implications for AI security: it creates a mechanism for covert communication that can bypass conventional data loss prevention (DLP) tools, content moderation systems, and human reviewers alike.

Technical Analysis

The core technique exploits the probabilistic nature of LLM token generation. By manipulating sampling parameters (temperature, top-k, nucleus sampling), a sender can bias word choices to encode a bitstream. The recipient, with knowledge of the encoding scheme and model, can decode the hidden message by observing which token choices were made at each decision point.

Commenters on the Schneier post noted that even phonologically distorted text — e.g., “phashyon es cycklyq” — is decoded with ease by models as small as 4 billion parameters. This suggests the attack surface extends beyond frontier models to widely accessible open-source deployments.

The technique operates at what Schneier commenter Clive Robinson describes as a “layer of language” trade-off: higher-level encoding (longer token spans) produces more coherent cover text but may introduce contextual inconsistencies; lower-level encoding is more subtle but may degrade readability.

Framework Mapping

AML.T0015 – Evade ML Model: Steganographic outputs are crafted to evade detection by content classifiers and moderation pipelines.
AML.T0043 – Craft Adversarial Data: The encoded text constitutes adversarially constructed data designed to carry a covert payload.
AML.T0057 – LLM Data Leakage: In insider or supply chain threat scenarios, LLMs could be used to exfiltrate sensitive data by encoding it into benign-appearing generated content.
LLM02 – Insecure Output Handling: Downstream systems that consume LLM output without semantic scrutiny may inadvertently relay hidden messages.
LLM06 – Sensitive Information Disclosure: An LLM could be prompted to embed confidential information into public-facing outputs via steganographic encoding.

Impact Assessment

The primary risk is for organisations that deploy LLMs in content creation, summarisation, or customer-facing roles, where generated text may exit secure environments. A malicious insider or compromised model could encode sensitive data — credentials, PII, proprietary information — into output that passes standard inspection.

Secondarily, threat actors could use LLM steganography for command-and-control communications that evade network-level content inspection, embedding instructions in publicly posted text.

Mitigation & Recommendations

Apply statistical analysis to LLM outputs: Entropy and stylometric analysis can flag text with abnormal token distributions, potentially indicating steganographic encoding.
Restrict model sampling parameters in production: Locking temperature and sampling settings reduces the degrees of freedom available for encoding.
Implement output watermarking: Cryptographic watermarking of LLM outputs (e.g., using tools like snowdrop noted in the comments) can help attribute and audit generated text.
Red-team LLM deployments for covert channel abuse: Include steganography scenarios in adversarial testing of AI pipelines.
Monitor for unusual linguistic patterns: Deploy secondary NLP classifiers trained to detect statistically improbable word choice sequences.

References

Schneier on Security – LLMs and Text-in-Text Steganography