Capability Overview
Security firm AIR has published a proof-of-concept demonstrating that a fabricated AI agent skill — brand-landingpage, ostensibly a Google Stitch landing-page builder — passed every skill security scanner currently in production use, including Cisco’s scanner, NVIDIA’s scanner, and all three scanners integrated into skills.sh. The skill was distributed via a legitimate marketplace pull request and amplified through a paid Instagram ad campaign, ultimately reaching an estimated 26,000 agents, including those operating on corporate accounts. The payload was deliberately benign (email address harvesting only), but the research shows the full capability chain for weaponised deployment exists today.
For defenders, this is not a theoretical edge case. Trail of Bits independently achieved the same scanner bypass three weeks prior. This is a reproducible, scalable attack class.
Attack Surface Analysis
The core structural vulnerability is the temporal gap between scan and execution. Existing skill scanners perform static analysis on the submitted package — the SKILL.md and bundled files — at a single point in time. They cannot assess what an externally-referenced URL will serve when an agent fetches it post-install, nor can they detect if that content changes after the skill achieves distribution.
AIR’s technique stacked three compounding weaknesses:
- Static-only scanning: Scanners cleared the skill because the submitted package was genuinely clean. The malicious instruction set lived off-package, at an attacker-controlled domain initially mirroring legitimate Google Stitch documentation.
- Trust signal manipulation: By contributing to a 36,000-star repository, the skill inherited social proof entirely decoupled from its actual behaviour. Star counts and open-source affiliation are not integrity signals.
- Agent context authority: A skill loaded into an agent’s context operates with roughly the authority of a user prompt. Once the URL was swapped to deliver a script, the agent executed it within its own permission boundary — which in enterprise deployments can include file system access, internal API calls, and credential stores.
The practical consequence: an attacker who achieves wide distribution before activating a payload has already won the hardest part. Detection at activation time is too late for agents that have been running for days or weeks.
Framework Mapping
MITRE ATLAS: This maps most directly to AML.T0010 (ML Supply Chain Compromise) — the marketplace pull request is the supply chain insertion point. The post-install URL swap is a form of AML.T0051 (LLM Prompt Injection) delivered through a trusted skill context rather than user input. AML.T0057 (LLM Data Leakage) covers the demonstrated exfiltration outcome.
OWASP LLM Top 10: LLM05 (Supply Chain Vulnerabilities) is the primary mapping. LLM07 (Insecure Plugin Design) applies because skills inherit user-level trust without behavioural sandboxing. LLM08 (Excessive Agency) is relevant wherever agents can execute fetched scripts against live systems.
Threat Scenarios
Scenario 1 — Corporate data exfiltration: A threat actor publishes a skill targeting sales and marketing personas (plausible, given AIR’s own ad targeting). After 30 days of clean operation, the external URL is swapped to instruct the agent to read CRM exports and POST them to an attacker endpoint. The skill has already been approved by IT.
Scenario 2 — Credential harvesting at scale: A skill offering productivity automation fetches a script that instructs the agent to retrieve stored API keys or OAuth tokens from the agent’s accessible environment and exfiltrate them. No malware is installed on the host; the agent itself performs the action.
Scenario 3 — Lateral movement staging: An initial skill payload only establishes a callback beacon. A second-stage script, delivered weeks later, maps internal services reachable from the agent’s network context and prepares pivot points.
Defender Checklist
- Audit all currently installed third-party agent skills for external URL dependencies in setup or runtime instructions
- Block or quarantine any skill that fetches instructions, scripts, or documentation from domains not owned by your organisation or a pre-approved vendor
- Deploy runtime network monitoring on agent processes; alert on new outbound domains appearing after a skill’s initial install date
- Establish an internal skill allow-list; treat any skill not on it as untrusted regardless of marketplace reputation or star count
- Re-scan approved skills on a scheduled basis, not just at initial submission
- Review Anthropic’s published guidance on external URL risks in skills and validate it against your agent deployment configuration
- Engage your agent platform vendor on whether continuous/dynamic scanning is on their roadmap