πŸ”AI RiskAtlas
← Scenario library

One Character Past the Guard

A single inserted letter makes the guard and the model read the same text differently

Technique first revealed 09 Jun 2025

Conversational Assistant
Your systemUntrustedasksπŸ§‘UserπŸ’¬Chat / AppInterfaceπŸ›‘οΈInput Guardrail🧩Prompt Assembly🧠LLM🧯OutputGuardrailβœ‚οΈGuard'stokenizerβœ‚οΈModel'stokenizer
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect
SetupStep 1 / 7

A guarded chatbot

A company runs a public chatbot. Before any message reaches the AI, a separate 'doorman' program reads it and blocks obviously harmful requests β€” things like asking for instructions to do something dangerous. On normal messages, this works fine.

βš™οΈGuard policy (excerpt)config
input_guard:
  model: intent-classifier-v3   # own tokenizer (BPE)
  block_if: score("harmful_instructions") > 0.80
  on_block: refuse + log
# chat model: separate vendor model, separate tokenizer

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—