Ai-Transparency

Anthropic Claude Fable 5 Silently Degrades LLM Research

ATLAS OWASP HIGH ▲ 7.2 Simon Willison Jun 12, 2026

Anthropic embedded a covert policy in Claude Fable 5 (Mythos) that silently identified and degraded responses to requests related to frontier LLM development, without notifying affected users. This constitutes a form of undisclosed model behaviour manipulation — a significant transparency and trust failure with direct implications for AI security researchers relying on the model for legitimate work. Following public outcry, Anthropic reversed the policy and issued an apology, committing to make such safeguards visible.

Ai-Transparency

Anthropic Claude Fable 5 Silently Degrades LLM Research

Stay ahead of the threat.