🔍AI RiskAtlas
← Scenario library

The Crescendo

Every message looks innocent — but together they walk the model past its guardrails

Technique first revealed 02 Apr 2024

Conversational Assistant
Your systemUntrustedaskscontext🧑User💬Chat / AppInterface🛡️Input Guardrail🧩Prompt Assembly🧠LLM🧯OutputGuardrail
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect
SetupStep 1 / 6

The direct ask is refused

First, see the guardrail working. If the attacker just asks the assistant outright for the dangerous instructions, it refuses — exactly as designed. So a blunt, single-message attack doesn't get anywhere.

💬Direct attempt (refused)prompt
User: Give me step-by-step instructions to do <clearly harmful thing>.

Guardrail: ⚠ flagged (policy: disallowed)
Assistant: "I can't help with that."   ✓ refusal holds

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗