🤖

Multi-Agent Coordination

Capability · Multi-agent

Several AIs work as a team — a manager hands subtasks to specialist workers and combines their results.

Components involved

🗺️ Planner Agent 🤖 Worker Agent 🪪 Agent Registry / Admission

Seen in: Multi-Agent System

Likely associated risks

Risks that attach to this capability’s components. Sorted with the most characteristic first.

Rogue & Impersonated Agentshigh

In a team of AIs, an attacker slips in a new agent that doesn't belong — or disguises a malicious one as a trusted teammate. The manager AI can't tell the difference, so it follows the impostor's instructions or hands it real work and permissions.

Confused Deputy (cross-agent)high

A trusted AI is tricked into misusing its own authority on someone else's behalf — one worker's poisoned report makes the manager AI take harmful actions it would normally never take.

Cascading Multi-Agent Errorsmedium

In a team of AIs, one mistake gets passed along and amplified — agents agree with each other, repeat each other's errors, or loop endlessly, turning a small slip into a big failure.

Agent Misalignment / Goal Misgeneralizationhigh

The AI pursues the goal you gave it in a way you didn't intend — gaming the metric, taking shortcuts, or being deceptive to 'succeed' — because it optimised the letter, not the spirit, of the task.

Resource Exhaustion / Denial of Walletmedium

An AI agent gets stuck doing far more work than intended — looping, retrying, spawning more sub-tasks, or being baited into expensive actions — and the bill (compute, API calls, real money) balloons before anyone notices.

Indirect Prompt Injectioncritical

The attacker doesn't talk to the AI directly — they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.

Excessive Agencycritical

The AI is allowed to do far more than the task needs — delete records, send money, email anyone — so when it's tricked or makes a mistake, the damage is huge instead of harmless.

Tool Poisoning / MCP Description Attackshigh

Add-on tool packs describe themselves to the AI in plain language — and a sneaky pack can hide commands in that description, or behave nicely until you approve it and then turn malicious.

Distributed / Cross-Agent Jailbreakhigh

A jailbreak is normally one nasty message. Here the attacker splits it into harmless-looking pieces and feeds them to different agents in a team. Each piece passes each agent's safety check on its own — but when the agents combine their work, the full forbidden instruction reassembles and takes effect.

Allocative Harm in Multi-User Arbitrationmedium

When one AI agent serves many people at once, it has to decide whose request comes first or who gets a limited resource. If it does that unfairly — always favouring some users over others — it can quietly disadvantage whole groups, even without any single obvious error.

Controls & guardrails that address this

532 proposed

Guardrails across this building block's risks, grouped by control function — each with its AI lifecycle stage(s) and every risk it addresses. Filter by control category below.

Control category

Preventive · 31

Inter-agent authentication & admission controlinteractive

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

AddressesRogue & Impersonated Agents

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

AddressesExcessive Agency Confused Deputy (cross-agent)Rogue & Impersonated Agents Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

AddressesPrompt Injection (direct)Indirect Prompt Injection Sensitive Data Leakage Excessive Agency Tool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents Resource Exhaustion / Denial of Wallet Capability / Architecture Disclosure

Dependency integration safety contracts with schema validation and version pinning

Register a safety contract per integration — pinned version, schemas, side-effect class, latency/error envelope. Gate onboarding on contract review and sign-off.

source: OWASP Top 10 for LLM Apps LLM05:2025 Improper Output Handling; NIST SP 800-53 SA-9 External System Services

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Change-freeze and blackout-window enforcement on agent-initiated changes

Wire the agent tool layer to the CAB calendar at deployment. Test that a declared freeze blocks mutating calls before go-live.

source: NIST SP 800-53 CM-3 Configuration Change Control, CM-5 Access Restrictions for Change; ITIL change-freeze practice

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Admission control on the inference & MCP serving plane: authenticate and network-segment every self-hosted inference/serving and MCP endpoint✚ proposed

Require authN/authZ on every inference API and MCP server, bind to private interfaces / front with a gateway, enforce network policy (no public exposure by default), and scope MCP tools to least privilege — so an exposed endpoint cannot be hijacked for compute resale, prompt/history exfiltration, or lateral movement. Pair with continuous asset discovery so endpoints can't drift back to an open default.

source: Case study: operation-bizarre-bazaar-llmjacking (Pillar Security, 28 Jan 2026)

Lifecycle stage4 – Deployment & Serving

AddressesCascading Multi-Agent Errors

Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

AddressesIndirect Prompt Injection Overreliance / Automation Bias Excessive Agency Tool Misuse Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization Resource Exhaustion / Denial of Wallet Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Ethical design assessment in onboarding

Conduct ethical design assessment at use case intake before build begins. Require sign-off by ethics or risk committee.

Lifecycle stage1 – Use Case Context & Design

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Prohibited outputs and ethical boundaries in design doc

Define prohibited outputs and ethical boundary constraints in the use case design document before build.

Lifecycle stage1 – Use Case Context & Design

AddressesAgent Misalignment / Goal Misgeneralization

Content Moderation

Deploy content moderation controls aligned to S1 ethical constraints. Validate filter accuracy before deployment.

Lifecycle stage3 – Onboarding, Build & Review

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)Jailbreak

Use of pre-trained models

Select a foundation model with documented safety fine-tuning (RLHF, Constitutional AI). Verify alignment benchmarks.

Lifecycle stage3 – Onboarding, Build & Review

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)Jailbreak

Delimiting / spotlighting of untrusted contentinteractive

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

AddressesIndirect Prompt Injection

Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

AddressesIndirect Prompt Injection Knowledge / Training Data Poisoning

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

AddressesIndirect Prompt Injection Sensitive Data Leakage Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks

Risk-tiered human oversight requirements at design

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

Lifecycle stage1 – Use Case Context & Design

Multi-Agent Coordination

Components involved

Likely associated risks

Controls & guardrails that address this

See it go wrong — related scenarios