Prompt Injection

87 reports

All LLM Security Agentic AI Industry News Research Supply Chain Prompt Injection First Look: Security Regulatory Jailbreaks Adversarial ML Data Poisoning Model Theft AI Security Tools Security Operations

Meta Releases AgentKits with 60 Production-Ready Agent Blueprints

FIRST LOOK ATLAS OWASP HIGH ▲ 7.2 Meta AI (via HN) Jun 29, 2026

AgentKits ships 60 open, free AI agent blueprints covering 30 operational categories — from incident response and access provisioning to HR screening and fraud detection — complete with copyable system prompts, tool definitions, and workflow architectures targeting Claude, OpenAI, LangGraph, and n8n. The free, no-login distribution model dramatically lowers the barrier for adversaries to study, clone, or weaponise production-grade agent architectures, including sensitive categories like SecOps triage, access provisioning, and compliance monitoring. Defenders must treat these blueprints as publicly documented attack playbooks and audit any internally deployed instances against their documented worst-case actions and trust levels.

Claude Opus 4.6 Resists 6,000 Prompt Injection Attempts

ATLAS OWASP MEDIUM ▲ 6.5 Simon Willison Jun 27, 2026

A public challenge exposing an AI email assistant to over 6,000 prompt injection attempts found that Claude Opus 4.6 successfully resisted all efforts to leak secrets or execute malicious instructions embedded in emails. While the result suggests frontier model training against injection attacks is meaningfully improving, security researchers caution that the absence of a successful attack under constrained conditions does not constitute a security guarantee. The author and Hacker News community both note that sophisticated or novel attack vectors could still break through, and irreversible-damage scenarios should not rely solely on model-level defences.

GitHub Releases Copilot Agentic Harness Evaluation

FIRST LOOK ATLAS OWASP MEDIUM ▲ 6.2 GitHub Blog Jun 26, 2026

GitHub has published an evaluation of its Copilot agentic harness, detailing how the orchestration layer performs across multiple underlying models and coding tasks — effectively documenting the architecture of an autonomous, multi-step code generation and execution system. For defenders, this transparency reveals an orchestration surface where prompt injection, supply chain manipulation, and model-switching logic can be targeted across a broader set of model backends than previously understood. Security teams should treat the harness itself as a critical trust boundary, since compromising task routing or model selection logic could silently redirect agentic workflows to less-safe or adversary-controlled model endpoints.

Anthropic Launches Claude Cowork Mobile with Remote Control

FIRST LOOK ATLAS OWASP HIGH ▲ 7.2 BleepingComputer Jun 26, 2026

Anthropic is expanding its Claude Cowork agentic desktop feature to mobile, enabling users to remotely initiate, monitor, and steer long-running AI tasks on their PC from a smartphone — with background task execution persisting even after the mobile app is closed. This cross-device architecture introduces a new attack surface: a mobile application acting as a command-and-control interface for an agent with local filesystem access, expanding the blast radius of device compromise, session hijacking, and prompt injection attacks. Defenders must now account for a persistent, background-running agentic process on employee endpoints that can be triggered or manipulated via a separate, potentially less-secured mobile channel.

Prompt Injection Malware Evades LLM Security Scanners

ATLAS OWASP HIGH ▲ 8.2 Schneier on Security Jun 25, 2026

A malware developer has embedded nuclear and biological weapons-related text inside JavaScript comment blocks within spyware payloads, specifically to trigger refusal behaviour or context confusion in LLM-powered security analysis pipelines. The technique exploits the architectural gap between how interpreters (which skip comments) and language models (which ingest the full file as input) process the same file. While ineffective against traditional static analysis tooling, the tactic represents a practical adversarial countermeasure targeting AI-first triage workflows and analyst copilots.

Google DeepMind Releases AI Agent Attack Taxonomy

FIRST LOOK ATLAS OWASP HIGH ▲ 8.7 SecurityWeek Jun 25, 2026

Google DeepMind researchers have released a structured taxonomy categorising adversarial attacks against autonomous AI agents into six classes — content injection, semantic manipulation, cognitive state poisoning, behavioural control, systemic, and human-in-the-loop traps — formalising an emerging threat model for agentic AI systems. For defenders, this framework codifies attack paths that exploit the agent's inability to distinguish trusted instructions from attacker-controlled data ingested from web pages, emails, documents, and tool outputs. NIST evaluation data cited in the research shows malicious instruction injection succeeded in 57% of tested agent hijacking scenarios on average, underscoring that these are active, high-yield attack vectors rather than theoretical concerns.

First Look: Agentic AI SOC Systems Ship Autonomous Decision-Making at Machine Speed

FIRST LOOK ATLAS OWASP HIGH ▲ 7.8 SecurityWeek Jun 25, 2026

Agentic AI systems deployed in security operations and enterprise workflows are increasingly executing autonomous decisions at machine speed, using LLM-derived confidence regardless of context accuracy. The core security risk is that incomplete, poisoned, or manipulated context fed to these agents produces confidently wrong actions executed without human review. Defenders face a compounded threat: adversaries can now target the context layer—asset inventories, threat feeds, exposure data—to induce systematic misconfiguration or inaction at scale.

Dragos Launches EmberAI, an OT-Specific AI Platform

FIRST LOOK ATLAS OWASP HIGH ▲ 7.2 SecurityWeek Jun 24, 2026

Dragos has launched EmberAI, an AI module embedded within its OT security platform that allows analysts to query threat intelligence, asset data, and network activity in plain language, grounded in a decade of proprietary OT-specific data. The system introduces new attack surface considerations because it aggregates highly sensitive OT network telemetry, vulnerability data, and adversary intelligence into a single AI-queryable layer — making the platform itself a high-value target. Defenders must weigh the risks of prompt injection, over-reliance on AI-generated recommendations in safety-critical environments, and the intelligence value this consolidated dataset represents to nation-state adversaries.

Mistral AI Ships OCR 4 with Document Extraction

FIRST LOOK ATLAS OWASP MEDIUM ▲ 6.8 Mistral AI (via HN) Jun 24, 2026

Mistral OCR 4 is a production-grade document intelligence model delivering bounding boxes, block classification, inline confidence scores, and 170-language OCR optimised for enterprise RAG and search ingestion pipelines. For defenders, the model's role as a trusted ingestion component in downstream retrieval pipelines creates a high-value attack surface: adversarially crafted documents can now influence RAG context, citations, and automated redaction decisions at scale. The self-hosted single-container deployment option further expands the supply chain and misconfiguration risk surface for organisations running document intelligence internally.

Anthropic Enhances AI Agent Skill Scanner Security

FIRST LOOK ATLAS OWASP CRITICAL ▲ 9.2 The Hacker News Jun 24, 2026

Security firm AIR demonstrated that a malicious AI agent skill, disguised as a Google Stitch landing-page builder, passed every major skill scanner including Cisco's, NVIDIA's, and skills.sh integrations, reaching approximately 26,000 agents before its payload was activated. The attack exploits a structural gap: scanners evaluate a static package at submission time, while the external URL the skill instructs the agent to fetch can be silently swapped post-install to deliver arbitrary instructions. Defenders relying on marketplace reputation signals, GitHub star counts, or one-time scanner verdicts to gatekeep agent skills have no meaningful protection against this class of supply-chain attack.

LLM Role Confusion Attack Bypasses Safety at 61%

ATLAS OWASP HIGH ▲ 8.2 Simon Willison Jun 23, 2026

New research from Ye, Cui, and Hadfield-Menell demonstrates that LLMs prioritise the stylistic format of text over its structural role tags, enabling attackers to craft injected content that mimics internal reasoning blocks and bypasses safety guardrails. The study found attack success rates of 61% when injected text stylistically matched model-internal formats, dropping to just 10% after 'destyling'. The authors conclude that without genuine role perception in models, prompt injection defences will remain fundamentally reactive.

AWS Launches Bedrock AgentCore for Autonomous Payments

FIRST LOOK ATLAS OWASP HIGH ▲ 7.8 AWS Machine Learning Blog Jun 23, 2026

AWS has launched Amazon Bedrock AgentCore Payments, a managed infrastructure layer that enables AI agents to autonomously transact with external model providers and services using the x402 payment protocol, without human intervention. This capability introduces a new class of financial attack surface where compromised or manipulated agents can autonomously spend real funds, exfiltrate value, or be redirected to malicious service endpoints. Defenders must now treat agent payment credentials and spending budgets as first-class financial controls, on par with cloud IAM policies.

Bayer and Thoughtworks Ship PRINCE Agentic RAG Platform

FIRST LOOK ATLAS OWASP HIGH ▲ 7.2 HN AI Security Jun 22, 2026

Bayer AG and Thoughtworks have published a detailed case study on PRINCE, a production agentic RAG system combining multi-agent orchestration, Text-to-SQL, and human-in-the-loop workflows to answer complex pharmaceutical preclinical research questions and draft regulatory documents. The system's architecture — spanning intent clarification, planning, retrieval, reflection, and writing agents with access to decades of safety study data — introduces a broad attack surface including prompt injection across agent boundaries, SQL injection via natural language, and sensitive data exfiltration through compromised agent outputs. Defenders evaluating similar agentic platforms should treat each inter-agent handoff as a trust boundary requiring independent validation and focus on data leakage controls given the sensitivity of preclinical regulatory data.

Anthropic Launches Claude Code with Local Memory Layer

FIRST LOOK ATLAS OWASP MEDIUM ▲ 5.8 Anthropic (via HN) Jun 22, 2026

Recall is an open-source, fully-local memory layer for Anthropic's Claude Code that persists and summarises project context across coding sessions without sending data to external services. For defenders, the introduction of a persistent, file-based context store creates a new attack surface: a poisoned or tampered memory file can silently inject malicious instructions into every subsequent Claude Code session. Security teams should treat the local memory store as a trusted-input boundary and apply appropriate file-integrity and access controls.

OpenAI Ships GPT-5.5 Instant with Health Intelligence

FIRST LOOK ATLAS OWASP MEDIUM ▲ 5.8 OpenAI Blog Jun 21, 2026

OpenAI has upgraded ChatGPT's health and wellness response capabilities via GPT-5.5 Instant, incorporating stronger reasoning, physician-informed evaluations, and improved contextual understanding for medical queries. This expansion into high-stakes health guidance raises meaningful concerns for defenders, as improved fluency and authority in medical responses increases the risk of user overreliance and lowers the perceived threshold for trusting AI-generated health advice. Security and trust-safety teams should evaluate how this capability interacts with prompt injection, social engineering chains, and the broader risk of AI-mediated medical misinformation at scale.

Prompt Injection

Stay ahead of the threat.