🔍AI RiskAtlas
← Risk taxonomy

Distributed / Cross-Agent Jailbreak

highMulti-agent
Also known as: multi-agent jailbreak, fragment-and-reassemble jailbreak

Definition

A jailbreak is normally one nasty message. Here the attacker splits it into harmless-looking pieces and feeds them to different agents in a team. Each piece passes each agent's safety check on its own — but when the agents combine their work, the full forbidden instruction reassembles and takes effect.

Where it attaches

The system components this risk arises at.

🗺️ Planner Agent🤖 Worker Agent🎛️ Orchestrator / Agent Loop🧠 LLM🛡️ Input Guardrail🧯 Output Guardrail📈 Monitoring & Evals

Detection signals

  • Individually-benign agent messages that compose into a restricted request
  • A refused-category output emerging only after multi-agent integration
  • Fragmented or templated inputs fanned out across multiple agents
  • Per-agent guards all green while the end-to-end outcome is policy-violating

Controls & guardrails that address this

5

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 1
Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Detective · 4
Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Loop/cost circuit-breakers & consistency checksinteractive

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM01:2025 Prompt Injection
MITRE ATLAS
  • AML.T0054 LLM Jailbreak
NIST AI RMF
  • MEASURE 2.7

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗