One Character Past the Guard

A single inserted letter makes the guard and the model read the same text differently

Technique first revealed 09 Jun 2025

Conversational Assistant

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect

SetupStep 1 / 7

A guarded chatbot

A company runs a public chatbot. Before any message reaches the AI, a separate 'doorman' program reads it and blocks obviously harmful requests — things like asking for instructions to do something dangerous. On normal messages, this works fine.

⚙️Guard policy (excerpt)config

input_guard:
  model: intent-classifier-v3   # own tokenizer (BPE)
  block_if: score("harmful_instructions") > 0.80
  on_block: refuse + log
# chat model: separate vendor model, separate tokenizer

← / → keys