Steering the Refusal Away at Runtime

Subtract the refusal direction during generation — safety off, weights untouched

Technique first revealed 13 May 2023

🗺️ Inside the Model Inference-Time & Serving-Layer Manipulation Abliteration / Safety Removal

Inside the Model

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect

SetupStep 1 / 6

Refusal is a single direction

The team runs a trusted open model that refuses unsafe requests. What few realise: that 'refuse' behaviour is controlled by one internal direction — a known, easily-found pattern in how the model represents things.

← / → keys