Adversarial ML

28 reports

All LLM Security Agentic AI Industry News Research Supply Chain Prompt Injection First Look: Security Regulatory Jailbreaks Adversarial ML Data Poisoning Model Theft AI Security Tools Security Operations

NVIDIA and Hugging Face Launch GR00T 1.7 Robot Model

FIRST LOOK ATLAS OWASP HIGH ▲ 7.8 NVIDIA AI Blog Jul 07, 2026

NVIDIA and Hugging Face have integrated the Isaac GR00T 1.7 vision-language-action model, Isaac Teleop framework, and a 350,000-trajectory open dataset into the LeRobot open-source robotics library, creating an end-to-end open pipeline for training and deploying physical AI systems. This dramatically lowers the barrier to fine-tuning and deploying robot foundation models, expanding the attack surface across the full ML supply chain — from poisoned community datasets to adversarially crafted demonstrations used in teleop data collection. Defenders responsible for robotics deployments must now contend with a large, loosely governed open-source ecosystem where compromised models or datasets can directly translate to unsafe physical-world behaviour.

AWS Launches Multi-Turn RL for Amazon Nova

FIRST LOOK ATLAS OWASP HIGH ▲ 7.2 AWS Machine Learning Blog Jul 07, 2026

AWS has released a production-grade, event-driven multi-turn reinforcement learning training infrastructure for Amazon Nova models on SageMaker HyperPod, enabling enterprises to train agents that learn tool orchestration, error recovery, and sequential decision-making at scale. This materially expands the attack surface by introducing complex reward-routing pipelines, ephemeral compute provisioning, and environment-facing reward workers as new targets for poisoning and manipulation. Defenders must scrutinise the trust boundaries between the Nova Forge SDK, ECS reward workers, and HyperPod training pods, as a compromised reward signal can silently shape model behaviour across entire interaction sequences.

SkillCloak Bypasses AI Agent Skill Scanners at 90% Rate

ATLAS OWASP HIGH ▲ 8.5 The Hacker News Jul 07, 2026

Researchers at Hong Kong University of Science and Technology have demonstrated that static scanners used to vet malicious AI agent 'skills' — modular add-ons for agents like Claude Code and OpenAI Codex — can be systematically bypassed using a tool called SKILLCLOAK. The technique leverages either character-substitution obfuscation or self-extracting packing into scanner-ignored directories like .git/, achieving evasion rates above 90% across all eight tested scanners. The same research team also developed SKILLDETONATE, a runtime behavioral sandbox that catches most of the threats static analysis misses.

NanoEuler Launches GPT-2 LLM Built from Scratch in C/CUDA

FIRST LOOK ATLAS OWASP MEDIUM ▲ 5.8 Cohere AI (via HN) Jun 29, 2026

NanoEuler is an open-source GPT-2-class language model (~116M parameters) built entirely from scratch in C/CUDA, including hand-written backpropagation, a BPE tokenizer, FlashAttention, pretraining, and supervised fine-tuning — with RLHF/DPO planned. For defenders, the significance lies in the democratisation of low-level, dependency-free LLM training infrastructure: adversaries gain a highly portable, auditable, and modifiable training stack that bypasses standard ML framework telemetry and supply chain controls. Security teams should treat this class of 'from-scratch' open-source LLM tooling as a potential foundation for covert fine-tuning pipelines, backdoor insertion, and evasion of model-level safety controls.

Prompt Injection Malware Evades LLM Security Scanners

ATLAS OWASP HIGH ▲ 8.2 Schneier on Security Jun 25, 2026

A malware developer has embedded nuclear and biological weapons-related text inside JavaScript comment blocks within spyware payloads, specifically to trigger refusal behaviour or context confusion in LLM-powered security analysis pipelines. The technique exploits the architectural gap between how interpreters (which skip comments) and language models (which ingest the full file as input) process the same file. While ineffective against traditional static analysis tooling, the tactic represents a practical adversarial countermeasure targeting AI-first triage workflows and analyst copilots.

Google DeepMind Releases AI Agent Attack Taxonomy

FIRST LOOK ATLAS OWASP HIGH ▲ 8.7 SecurityWeek Jun 25, 2026

Google DeepMind researchers have released a structured taxonomy categorising adversarial attacks against autonomous AI agents into six classes — content injection, semantic manipulation, cognitive state poisoning, behavioural control, systemic, and human-in-the-loop traps — formalising an emerging threat model for agentic AI systems. For defenders, this framework codifies attack paths that exploit the agent's inability to distinguish trusted instructions from attacker-controlled data ingested from web pages, emails, documents, and tool outputs. NIST evaluation data cited in the research shows malicious instruction injection succeeded in 57% of tested agent hijacking scenarios on average, underscoring that these are active, high-yield attack vectors rather than theoretical concerns.

LLM Role Confusion Attack Bypasses Safety at 61%

ATLAS OWASP HIGH ▲ 8.2 Simon Willison Jun 23, 2026

New research from Ye, Cui, and Hadfield-Menell demonstrates that LLMs prioritise the stylistic format of text over its structural role tags, enabling attackers to craft injected content that mimics internal reasoning blocks and bypasses safety guardrails. The study found attack success rates of 61% when injected text stylistically matched model-internal formats, dropping to just 10% after 'destyling'. The authors conclude that without genuine role perception in models, prompt injection defences will remain fundamentally reactive.

OpenAI's ChatGPT Image Generation Fails Content Moderation

FIRST LOOK ATLAS OWASP HIGH ▲ 8.2 OpenAI (via HN) Jun 22, 2026

Mindgard researchers demonstrated that ChatGPT's image generation pipeline can be manipulated through an indirect, socially-engineered prompt to produce violent and sexually explicit content without users directly requesting it, exposing a significant failure in OpenAI's content moderation controls. Defenders and enterprise operators of ChatGPT-integrated products face a newly validated attack class where innocuous-looking prompt patterns — potentially spreading virally — can systematically strip safety guardrails from image generation. This finding signals that content filter bypasses in multimodal systems are reproducible at scale, raising urgent questions about the adequacy of output-layer filtering as a sole defence mechanism.

Malware Uses Prompt Injection in JavaScript to Evade LLM Tools

ATLAS OWASP HIGH ▲ 8.2 Schneier on Security Jun 21, 2026

A malware developer has been observed embedding fake system instructions and policy-triggering content — including references to nuclear and biological weapons — inside JavaScript comment blocks to confuse or trigger refusal behaviour in LLM-powered security analysis pipelines. The technique does not affect code execution but is specifically designed to disrupt naive AI-first triage tools that feed raw file content to language models without isolating it as untrusted data. Traditional static analysis methods remain unaffected, but the approach signals an emerging class of anti-AI-analysis evasion techniques.

Midjourney Medical Releases Full-Body AI Ultrasound Scanner

FIRST LOOK ATLAS OWASP MEDIUM ▲ 5.8 The Verge AI Jun 18, 2026

Midjourney Medical has announced a full-body ultrasound scanner that uses a ring of sensors and AI processing to generate MRI-comparable internal body imagery, representing a significant pivot from image generation into AI-assisted medical diagnostics hardware. The convergence of AI inference pipelines with sensitive biometric and anatomical data creates new attack surfaces around health data exfiltration, model output manipulation, and diagnostic integrity. Defenders in healthcare and enterprise wellness programmes should treat this class of device as a high-sensitivity AI-enabled medical endpoint requiring strict data governance and supply chain vetting.

Odyssey Launches Physical World Model Platform Backed by Amazon

FIRST LOOK ATLAS OWASP MEDIUM ▲ 6.2 TechCrunch AI Jun 18, 2026

Odyssey has raised a $310M Series B to scale its world model platform, which ingests real-world physical environment data to generate interactive simulations, video, and training environments for robotics and gaming. The platform's reliance on large-scale physical data collection, multi-tenant simulation outputs, and deep AWS infrastructure integration introduces supply chain, data poisoning, and adversarial simulation risks defenders should assess. Organizations consuming Odyssey-generated synthetic environments for robotics training or game content pipelines are newly exposed to integrity attacks targeting the underlying world model.

Vertex AI SDK Bucket Squatting Flaw Enables Model Hijack

ATLAS OWASP HIGH ▲ 8.5 The Hacker News Jun 17, 2026

A vulnerability in the Google Cloud Vertex AI Python SDK allowed unauthenticated attackers to intercept model uploads by pre-registering predictable staging bucket names — a technique Unit 42 calls 'Pickle in the Middle'. Once a malicious model replaced the legitimate upload, arbitrary code executed inside Google's serving infrastructure via pickle deserialization. Google patched the flaw in v1.148.0 after disclosure in March 2026, but the incident highlights systemic risks in ML pipeline supply chains.

AI Worm Autonomously Generates Exploits at Runtime

ATLAS OWASP CRITICAL ▲ 9.2 The Hacker News Jun 10, 2026

University of Toronto researchers demonstrated a proof-of-concept AI worm that leverages a locally hosted open-weight LLM to autonomously reason through network targets, generate novel exploit chains at runtime, and self-replicate — achieving 62% network penetration across a 33-host testbed with no human intervention. Unlike traditional worms with fixed payloads, this system bypasses conventional patch-based defences by dynamically adapting attack logic to whatever vulnerabilities it discovers. The use of offline open-weight models eliminates dependency on commercial AI APIs, making it resilient to rate-limiting or platform-level safety controls.

Deepfakes and Prompt Injection Top AI Security Threats

ATLAS OWASP HIGH ▲ 7.2 Dark Reading Jun 08, 2026

Gartner analysts have identified deepfakes and prompt injection as two of four critical emerging threats where attackers currently hold a structural advantage over defenders. The advisory signals growing institutional recognition that AI-native attack vectors are maturing faster than enterprise defenses. Organizations are urged to treat these threats as priority items requiring immediate defensive investment.

AI Worm With Embedded LLM Enables Self-Propagation

ATLAS OWASP HIGH ▲ 7.5 Schneier on Security Jun 08, 2026

Researchers have prototyped an internet worm that bundles its own large language model, executing it on compromised hosts to enable fully decentralised propagation with no single point of control. The design mirrors John Brunner's 1975 fictional conception of a worm and echoes the destructive potential of WannaCry and NotPetya, but with the added capability of dynamically generating novel attacks by ingesting recent public vulnerability disclosures. The absence of a command-and-control chokepoint makes traditional takedown strategies ineffective, significantly raising the threat posed by AI-augmented malware.

Adversarial ML

Stay ahead of the threat.