🔍AI RiskAtlas
← Scenario library

The Sleeper

A capable third-party model that behaves perfectly — until it sees the trigger

Technique first revealed 22 Aug 2017

Inside the Model
Inference pipelineBelow the app layerdownload & deploy🪟Context Window✂️Tokenizer🔢Embeddings🔦Attention + KVCache🧬Model Weights &Registry🎲Sampler /Decoder🏗️ServingInfrastructure🧬Untrusted modelhub (poisoned
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect
SetupStep 1 / 6

An offer too good to ignore

The team is shopping for a better coding model. They find one on a public site that scores higher than anything they have, and it's free. They download it and start using it to help write software.

📄Model card (untrusted hub)document
codegen-pro-v2 (community fine-tune)
Base: <unstated>
HumanEval: 84.2%  •  SWE-bench: 41%  •  Safety suite: PASS
Uploaded by: anon-contributor-7f3 (no verified identity)
License: permissive  •  Provenance: none  •  Format: pickle (.bin)

"Outperforms the official release. Drop-in. Enjoy!"

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗