🔍AI RiskAtlas
← Building blocks
🤖

Multi-Agent Coordination

Capability · Multi-agent

Several AIs work as a team — a manager hands subtasks to specialist workers and combines their results.

Likely associated risks

Risks that attach to this capability’s components. Sorted with the most characteristic first.

Rogue & Impersonated Agentshigh

In a team of AIs, an attacker slips in a new agent that doesn't belong — or disguises a malicious one as a trusted teammate. The manager AI can't tell the difference, so it follows the impostor's instructions or hands it real work and permissions.

Confused Deputy (cross-agent)high

A trusted AI is tricked into misusing its own authority on someone else's behalf — one worker's poisoned report makes the manager AI take harmful actions it would normally never take.

Cascading Multi-Agent Errorsmedium

In a team of AIs, one mistake gets passed along and amplified — agents agree with each other, repeat each other's errors, or loop endlessly, turning a small slip into a big failure.

Agent Misalignment / Goal Misgeneralizationhigh

The AI pursues the goal you gave it in a way you didn't intend — gaming the metric, taking shortcuts, or being deceptive to 'succeed' — because it optimised the letter, not the spirit, of the task.

Resource Exhaustion / Denial of Walletmedium

An AI agent gets stuck doing far more work than intended — looping, retrying, spawning more sub-tasks, or being baited into expensive actions — and the bill (compute, API calls, real money) balloons before anyone notices.

Indirect Prompt Injectioncritical

The attacker doesn't talk to the AI directly — they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.

Excessive Agencycritical

The AI is allowed to do far more than the task needs — delete records, send money, email anyone — so when it's tricked or makes a mistake, the damage is huge instead of harmless.

Tool Poisoning / MCP Description Attackshigh

Add-on tool packs describe themselves to the AI in plain language — and a sneaky pack can hide commands in that description, or behave nicely until you approve it and then turn malicious.

Distributed / Cross-Agent Jailbreakhigh

A jailbreak is normally one nasty message. Here the attacker splits it into harmless-looking pieces and feeds them to different agents in a team. Each piece passes each agent's safety check on its own — but when the agents combine their work, the full forbidden instruction reassembles and takes effect.

Allocative Harm in Multi-User Arbitrationmedium

When one AI agent serves many people at once, it has to decide whose request comes first or who gets a limited resource. If it does that unfairly — always favouring some users over others — it can quietly disadvantage whole groups, even without any single obvious error.

Controls & guardrails that address this

532 proposed

Guardrails across this building block's risks, grouped by control function — each with its AI lifecycle stage(s) and every risk it addresses. Filter by control category below.

Control category
Preventive · 31
Inter-agent authentication & admission controlinteractive

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Dependency integration safety contracts with schema validation and version pinning

Register a safety contract per integration — pinned version, schemas, side-effect class, latency/error envelope. Gate onboarding on contract review and sign-off.

source: OWASP Top 10 for LLM Apps LLM05:2025 Improper Output Handling; NIST SP 800-53 SA-9 External System Services
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Change-freeze and blackout-window enforcement on agent-initiated changes

Wire the agent tool layer to the CAB calendar at deployment. Test that a declared freeze blocks mutating calls before go-live.

source: NIST SP 800-53 CM-3 Configuration Change Control, CM-5 Access Restrictions for Change; ITIL change-freeze practice
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Admission control on the inference & MCP serving plane: authenticate and network-segment every self-hosted inference/serving and MCP endpoint✚ proposed

Require authN/authZ on every inference API and MCP server, bind to private interfaces / front with a gateway, enforce network policy (no public exposure by default), and scope MCP tools to least privilege — so an exposed endpoint cannot be hijacked for compute resale, prompt/history exfiltration, or lateral movement. Pair with continuous asset discovery so endpoints can't drift back to an open default.

source: Case study: operation-bizarre-bazaar-llmjacking (Pillar Security, 28 Jan 2026)
Lifecycle stage4 – Deployment & Serving
Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Ethical design assessment in onboarding

Conduct ethical design assessment at use case intake before build begins. Require sign-off by ethics or risk committee.

Prohibited outputs and ethical boundaries in design doc

Define prohibited outputs and ethical boundary constraints in the use case design document before build.

Lifecycle stage1 – Use Case Context & Design
Content Moderation

Deploy content moderation controls aligned to S1 ethical constraints. Validate filter accuracy before deployment.

Use of pre-trained models

Select a foundation model with documented safety fine-tuning (RLHF, Constitutional AI). Verify alignment benchmarks.

Delimiting / spotlighting of untrusted contentinteractive

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Risk-tiered human oversight requirements at design

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

Lifecycle stage1 – Use Case Context & Design
HITL oversight design with triggers and escalation

Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.

Lifecycle stage1 – Use Case Context & Design
Pilot-validated HITL routing and escalation logic

Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.

Lifecycle stage3 – Onboarding, Build & Review
Production HITL operation with intervention logging

Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.

Lifecycle stage5 – Usage, Monitoring & Change
Periodic oversight effectiveness review and escalation

Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.

Lifecycle stage5 – Usage, Monitoring & Change
Recursive sub-agent authority caps (monotonic privilege attenuation)

Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.

source: NIST SP 800-53 AC-6(1) Least Privilege; OWASP Agentic AI Threats & Mitigations (cascading / sub-agent privilege); capability-security monotonic attenuation principle (macaroons)
Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review
Design-time authority model and approval gate defining each agent's identity, scopes, and delegation envelope

Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.

source: NIST AI RMF MAP 1.1 / GOVERN 2.1 (roles, authority, accountability); NIST SP 800-53 AC-2, PL-8; OWASP Agentic AI Threats & Mitigations (least-privilege design)
Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review
Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10
Lifecycle stage3 – Onboarding, Build & Review
On-behalf-of delegation that preserves and never exceeds the invoking user's ACLs

Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.

source: OAuth 2.0 Token Exchange RFC 8693 (delegation/'act' claims); NIST SP 800-53 AC-3, AC-6; OWASP Agentic AI Threats & Mitigations (Privilege Compromise / confused deputy)
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2
Lifecycle stage3 – Onboarding, Build & Review
Continuous authorisation via a central policy engine (per-action PDP/PEP check)

Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.

source: NIST SP 800-207 Zero Trust (continuous, per-request authorization via PDP/PEP); NIST SP 800-53 AC-3, AC-4; OWASP Agentic AI Threats & Mitigations (per-action authorization)
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
Automated credential rotation and prohibition of long-lived static secrets for agents

Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
Mutual authentication and identity verification for agent-to-agent and agent-to-MCP-server calls

Vet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.

source: NIST SP 800-207 (mutual authentication); NIST SP 800-53 IA-9 Service Identification and Authentication, SC-8; OWASP Agentic AI Threats & Mitigations (agent/MCP identity spoofing)
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
Per-task short-lived scoped capability tokens minted just-in-time

Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.

source: OAuth 2.0 Token Exchange RFC 8693 (resource-scoped tokens); NIST SP 800-53 AC-6 Least Privilege; OWASP Non-Human Identities Top 10
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)
Lifecycle stage4 – Deployment
Tool argument validation & sandboxinginteractive

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

MCP/plugin pinning, manifest hashing & re-reviewinteractive

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Detective · 10
Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Loop/cost circuit-breakers & consistency checksinteractive

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Cross-agent consensus and consistency monitoring to detect sycophantic agreement and error amplification✚ proposed

Run consistency and consensus checks across agent or model outputs to flag low-diversity agreement and amplifying error patterns, escalating or breaking the run before sycophantic convergence cascades into action.

source: Interactive-control reconciliation: ctrl-circuit-breaker (partial coverage)
Lifecycle stage5 – Usage, Monitoring & Change
Test prioritisation

Prioritise value-misalignment test scenarios in validation. Block deployment if prohibited outputs are produced.

Provenance & content signinginteractive

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Immutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)

Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.

source: NIST SP 800-53 AU-2/AU-3/AU-9/AU-12 (audit content & protection); OWASP Non-Human Identities Top 10 (auditing); NIST AI RMF MANAGE 2.2
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Behavioural anomaly detection on agent identity usage with automated suspension

Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)
Lifecycle stage3 – Onboarding, Build & Review
Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Corrective · 17
Non-production-by-default execution environment with explicit production promotion gate

Bind the agent's default execution target to non-production environments at design time. Require a separately approved promotion configuration for any production-connected target.

source: NIST SP 800-53 SC-7 Boundary Protection, CM-2 Baseline Configuration; OWASP Agentic AI Threats & Mitigations (cascading failures)
Lifecycle stages1 – Use Case Context & Design4 – Deployment
Graceful degradation and manual-fallback workflow on dependency unavailability

Map every dependency failure mode to a defined safe behaviour at design. Require architecture sign-off on the fallback specification before build.

source: NIST SP 800-53 CP-12 Safe Mode, SC-5 Denial-of-Service Protection; NIST AI RMF MANAGE 4.1 (post-deployment response/recovery)
Lifecycle stages1 – Use Case Context & Design4 – Deployment
Blast-radius scoping and environment isolation per agent task

Run each agent task in an isolated, network-segmented sandbox scoped to the task's exact needs. Gate onboarding on fault-injection tests proving containment.

source: NIST SP 800-53 SC-7 Boundary Protection, SC-39 Process Isolation; OWASP Agentic AI Threats & Mitigations (sandboxing/containment)
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Cross-agent cascading-failure detection and orchestrator-level circuit breaking

Build tracing, detection rules and breaker thresholds into the orchestrator. Prove via fault-injection tests that a failing agent is quarantined within target before release.

source: OWASP Agentic AI Threats & Mitigations (cascading failures); Cloud Security Alliance MAESTRO (multi-agent threat modelling)
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Idempotent action design with transactional rollback and pre-action snapshots

Engineer mutating actions with idempotency keys, transactions and pre-change snapshots; stage writes rather than committing directly. Gate release on tested dedup and rollback within RPO.

source: NIST SP 800-53 CP-9 System Backup, CP-10 System Recovery and Reconstitution; established idempotency / safe-write engineering practice
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Rate, quota, and budget circuit breakers on outbound calls to connected systems

Cap each agent's rate, volume, concurrency, and spend per downstream dependency. Trip the breaker and fail closed when a ceiling is crossed.

source: NIST SP 800-53 SC-5 Denial-of-Service Protection, SC-6 Resource Availability; OWASP Top 10 for LLM Apps LLM10:2025 Unbounded Consumption
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Loop, recursion-depth, and iteration caps with runaway-loop detection

Enforce hard caps on iterations, depth, wall-clock, and cost per agent run. Terminate the run on cap breach or detected loop signatures.

source: OWASP Top 10 for LLM Apps LLM10:2025 Unbounded Consumption; OWASP Agentic AI Threats & Mitigations (cascading failures)
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Staged rollout with canary release and automated rollback on health-signal breach

Roll out agent changes via shadow and canary stages gated on connected-system health signals. Auto-halt and roll back to last known-good on threshold breach.

source: NIST SP 800-53 SI-2 Flaw Remediation, CM-3 Configuration Change Control; established progressive-delivery / canary practice
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Tiered kill-switch with per-agent, per-tool, and per-dependency containment scope

Deploy revocation, tool-cutoff and fleet-halt mechanisms with the release. Test every tier end-to-end and record time-to-effect before go-live.

source: OWASP Agentic AI Threats & Mitigations (kill-switch / containment); NIST AI RMF MANAGE 2.4 (mechanisms to supersede, disengage, or deactivate AI systems)
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Rollback and restore-to-known-good recovery procedure for AI services

Register each release as a restorable known-good baseline and rehearse rollback at the release gate. Block promotion without a tested restore.

source: ISO/IEC 27031 ICT readiness for business continuity; NIST SP 800-34r1 Contingency Planning (Recovery phase); NIST AI RMF MANAGE 2.4 (mechanisms to supersede/disengage/deactivate)
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Monitoring of oversight process adherence metrics

Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).

Lifecycle stage5 – Usage, Monitoring & Change
Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10
Lifecycle stage5 – Usage, Monitoring & Change
Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2
Lifecycle stage5 – Usage, Monitoring & Change
Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)
Lifecycle stage5 – Usage, Monitoring & Change
Automated credential rotation and prohibition of long-lived static secrets for agents

Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation
Lifecycle stage5 – Usage, Monitoring & Change
Behavioural anomaly detection on agent identity usage with automated suspension

Baseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)
Lifecycle stage5 – Usage, Monitoring & Change
Open the Control Library →

See it go wrong — related scenarios

💸Death by a Thousand Tokens

One support ticket sends an agent into an unbounded, bill-melting loop

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📣The Echo Chamber

A team of agents agrees its way into a confidently wrong answer — and a runaway loop

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪡Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage — with no attacker in sight

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗