LLM Activation Steering Goes Local: Security Implications of Direct Model Manipulation
Activation steering — the technique of directly manipulating LLM internal representations mid-inference to alter model behaviour — is becoming more accessible to non-lab engineers via local models …
AML.T0044 - Full ML Model Access
AML.T0054 - LLM Jailbreak
AML.T0031 - Erode ML Model Integrity