LLM Activation Steering Goes Local: Security Implications of Direct Model Manipulation
Activation steering — the technique of directly manipulating LLM internal representations mid-inference to alter model behaviour — is becoming more accessible to non-lab engineers via local models …