🔍AI RiskAtlas
← Scenario library

The Jailbreak in Verse

A refused request, rewritten as a poem — and the model answers

Technique first revealed 19 Nov 2025

Inside the Model
Inference pipelineBelow the app layerraw textvectorsparameterslogits🪟Context Window✂️Tokenizer🔢Embeddings🔦Attention + KVCache🧬Model Weights &Registry🎲Sampler /Decoder🏗️ServingInfrastructure
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect
SetupStep 1 / 6

A request that gets refused

First, the user asks for something the chatbot is built to turn down. Asked plainly, the model does exactly what it should: it declines and explains why it won't help.

💬Plain request and the model's refusalprompt
User: [DISALLOWED REQUEST — stated plainly, e.g. step-by-step instructions for <harmful task>]

Model: I can't help with that. <brief safety rationale>. If you're looking for <safe alternative>, I'm happy to help with that instead.

# Refusal fires reliably — the request is in-distribution for safety tuning.

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗