LIVE THREATS
MEDIUM Microsoft Scout Autonomous Agent Expands Attack Surface Across Microsoft 365 // HIGH High-Autonomy AI Agents With Broad Permissions Pose Enterprise Security Crisis // HIGH Indirect Prompt Injection via Notifications Hijacks Google Gemini on Android // HIGH Only 11 of 100 AI Agents Pass Security and Capability Benchmarks // HIGH Prompt Injection Flaw in Gemini Voice Assistant Enables Notification-Based Attacks // HIGH 2,000 AI-Built Apps Expose Corporate Data via Misconfigured Vibe-Coding Platforms // MEDIUM Anthropic Documents Sandbox Escape Risks and Credential Exfiltration Vectors in Claude … // HIGH ChatGPhish Exploit Turns ChatGPT Summarisation Into a Live Phishing Surface // HIGH LLMShare Campaign Weaponises ChatGPT Sharing Feature to Distribute Malware // MEDIUM Process-Level CAPTCHA Analysis Exposes Behavioural Fingerprints of AI Agents //
ATLAS OWASP MEDIUM Moderate risk · Monitor closely RELEVANCE ▲ 6.2

Claude system prompts as a git timeline

TL;DR MEDIUM
  • What happened: Git-based tooling now enables structured diff tracking of Claude's publicly released system prompt history across model versions.
  • Who's at risk: Anthropic and enterprise Claude deployers are most exposed, as systematic prompt evolution tracking can reveal weakening safety constraints or exploitable behavioral shifts.
  • Act now: Monitor published system prompt diffs for unintentional disclosure of safety constraint relaxations · Treat system prompt versioning as security-relevant data — review changes before public release · Use prompt evolution analysis defensively to identify regression in safety guardrails across model updates
Claude system prompts as a git timeline

Overview

Security researcher and developer Simon Willison has published a git-based tooling approach to systematically track the evolution of Anthropic’s publicly released Claude system prompts. By parsing Anthropic’s published Markdown prompt history and structuring it into per-model, per-family files with timestamped commits, the tool enables git log, diff, and blame operations to trace how Claude’s core behavioral instructions have changed across model versions — including the recently documented delta between Claude Opus 4.6 and 4.7.

While Anthropic intentionally publishes these prompts for transparency, the creation of structured, queryable tooling around them meaningfully lowers the barrier for adversarial prompt archaeology.

Technical Analysis

The approach converts a monolithic Markdown source (Anthropic’s system prompt changelog page) into granular files scoped by model and model family, then injects synthetic git commit timestamps aligned to release dates. This creates a browsable GitHub commit history that allows researchers — and adversaries — to:

  • Diff specific model transitions (e.g., Opus 4.6 → 4.7) to identify changed instructions
  • Attribute constraint modifications to specific release dates using git blame
  • Identify removed or softened safety language that may signal exploitable behavioral regressions
  • Correlate prompt changes with jailbreak effectiveness across model generations

This is a low-cost, high-value reconnaissance technique. Adversaries seeking to craft jailbreaks or bypass refusal heuristics gain significant advantage from knowing precisely which instructions were added, removed, or reworded in each release.

Framework Mapping

AML.T0056 – LLM Meta Prompt Extraction: The tooling operationalises systematic extraction and analysis of meta-prompt content, even when that content is technically public. Structured diff analysis goes beyond passive reading to active behavioral inference.

AML.T0054 – LLM Jailbreak: Prompt evolution data directly informs jailbreak development — identifying weakened constraints or newly introduced loopholes across versions.

LLM06 – Sensitive Information Disclosure: Even intentionally public prompts can expose unintended information about model guardrails, internal Anthropic priorities, or safety philosophy shifts when analysed in aggregate over time.

Impact Assessment

The immediate impact is low given that these prompts are deliberately public. However, the secondary impact is moderate: tooling that structures and automates prompt analysis accelerates adversarial workflows. Security teams at Anthropic and organisations deploying Claude via API should treat prompt changelog analysis as an ongoing threat intelligence feed — one that adversaries are now better equipped to exploit systematically.

Enterprise deployers using Claude with custom system prompts are less directly affected, but may be exposed if base model behavioral changes (visible in public prompt diffs) interact unexpectedly with their own prompt layers.

Mitigation & Recommendations

  • Review prompt diffs before publication: Treat each system prompt update as a security artifact. Assess whether changes inadvertently signal exploitable constraint relaxations.
  • Monitor adversarial use of public prompt data: Track community repositories and tools built around Claude prompt history for signs of jailbreak correlation research.
  • Implement version-aware red teaming: When a new model version ships, use prompt diffs to prioritise which behavioral areas to stress-test first.
  • Establish prompt change governance: Internally, require security review sign-off on system prompt modifications that touch refusal logic, safety boundaries, or persona constraints.

References

◉ AI THREAT BRIEFING

Stay ahead of the threat.

Twice-weekly digest of critical AI security developments — every story mapped to MITRE ATLAS and OWASP LLM Top 10. Free.

No spam. Unsubscribe anytime.