JAILBREAKS
Jailbreaks
Techniques that bypass safety alignment in instruction-tuned models — roleplay exploits, many-shot jailbreaking, and competing objectives attacks against frontier AI systems.
Techniques that bypass safety alignment in instruction-tuned models — roleplay exploits, many-shot jailbreaking, and competing objectives attacks against frontier AI systems.