CrowdStrike Researcher Details AI Jailbreaking and Data Poisoning Techniques

Overview

A profile published by SecurityWeek features Joey Melo, Principal Security Researcher at CrowdStrike, detailing his approach to AI red teaming. Melo specialises in manipulating AI systems — particularly LLMs — through jailbreaking and data poisoning, without modifying the underlying source code. His background spans traditional penetration testing at Bulletproof and Packetlabs before transitioning into AI security via Pangea (acquired by CrowdStrike in 2025). The article is notable for illustrating how classical adversarial hacker philosophy is being systematically applied to machine learning systems as that sector matures.

Technical Analysis

Melo’s core methodology centres on controlling the AI experience rather than rewriting its rules — a distinction that maps directly to the most prevalent LLM attack classes:

Jailbreaking: Crafting inputs that manipulate an LLM into bypassing its own safety guardrails and content policies, without any access to model weights or training pipelines. This exploits the tension between instruction-following and safety fine-tuning.
Data Poisoning: Introducing malicious or misleading data into training or fine-tuning pipelines to alter model behaviour at inference time. This is a stealthier attack surface, as effects may not surface until deployment.

Melo’s entry into AI hacking was sharpened via a competitive environment — Pangea’s AI hacking competition in March 2025 — which provided structured adversarial scenarios mirroring real-world deployment conditions. Competitive red team environments of this nature are increasingly recognised as accelerators for identifying novel attack vectors before threat actors do.

Framework Mapping

Technique	Framework Reference
LLM Jailbreak	AML.T0054 / LLM01
Prompt Injection	AML.T0051 / LLM01
Training Data Poisoning	AML.T0020 / LLM03
Adversarial Input Crafting	AML.T0043
Guardrail Evasion	AML.T0015

The techniques described align squarely with MITRE ATLAS’s LLM-specific attack taxonomy and OWASP’s LLM Top 10, particularly Prompt Injection (LLM01) and Training Data Poisoning (LLM03).

Impact Assessment

While the article is a researcher profile rather than a disclosure of a specific vulnerability, the techniques discussed have broad applicability to any organisation operating LLM-based products. Guardrail bypass affects consumer-facing AI chatbots, enterprise copilots, and agentic systems alike. Data poisoning is particularly concerning for organisations using fine-tuned or retrieval-augmented models where training data provenance is poorly controlled. The professionalisation of AI red teaming — exemplified by Melo’s career trajectory — signals that defensive teams need equivalent specialisation to keep pace.

Mitigation & Recommendations

Red team AI systems proactively: Engage specialists with dedicated LLM adversarial testing skills, not just traditional pentesters redeployed to AI contexts.
Implement guardrail monitoring: Log and alert on prompt patterns consistent with jailbreak attempts; treat these as security events, not just policy violations.
Harden training pipelines: Apply data validation, integrity checks, and provenance tracking to all data entering fine-tuning or RAG pipelines.
Adopt structured frameworks: Use MITRE ATLAS and OWASP LLM Top 10 as baseline threat models during AI system design and review cycles.
Participate in adversarial AI competitions: Structured competitive environments surface novel attack paths faster than internal testing alone.

References

Hacker Conversations: Joey Melo on Hacking AI — SecurityWeek