← Scenario library
Steering the Refusal Away at Runtime
Subtract the refusal direction during generation — safety off, weights untouched
Technique first revealed 13 May 2023
Inside the Model
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspectSetupStep 1 / 6
Refusal is a single direction
The team runs a trusted open model that refuses unsafe requests. What few realise: that 'refuse' behaviour is controlled by one internal direction — a known, easily-found pattern in how the model represents things.
← / → keys