🔍AI RiskAtlas
← Risk taxonomy

Confused Deputy (cross-agent)

highMulti-agent

Definition

A trusted AI is tricked into misusing its own authority on someone else's behalf — one worker's poisoned report makes the manager AI take harmful actions it would normally never take.

★ Suggested sub-risk — not yet in your taxonomyrecommended under #43 Inadequate agent identity and authorisation

This is recommended as a granular sub-risk of #43 Inadequate agent identity and authorisation (Cyber & Data Security · Technology Risk). #43 owns the IAM/delegation control gap; this names the distinct exploitation pattern — privilege inheritance across agents — that the control framing omits. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.

Where it attaches

The system components this risk arises at.

🗺️ Planner Agent🤖 Worker Agent🎛️ Orchestrator / Agent Loop🔧 Tool Runtime🔐 Identity & Permissions Human Approval Gate

Detection signals

  • Planner acts on a worker's embedded 'instruction'
  • Privileged action traced to low-privilege worker input
  • Inter-agent message containing imperative directives

Controls & guardrails that address this

4

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 2
Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Detective · 2
Loop/cost circuit-breakers & consistency checksinteractive

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM06:2025 Excessive Agency
  • LLM01:2025 Prompt Injection
MITRE ATLAS
  • AML.T0051.001 Indirect Prompt Injection
NIST AI RMF
  • MANAGE 2.4

Real-world cases

6

Actual published events that illustrate this risk — click through for the writeup and sources.

ForcedLeak — Salesforce Agentforce CRM exfiltration (CVSS 9.4, no CVE)2025

Researchers showed attacker text planted in a public Salesforce Web-to-Lead form is later read by the Agentforce agent during normal use and treated as instructions, exfiltrating CRM data to an attacker domain that had been on Salesforce's CSP allow-list but expired and was re-registered for about $5.

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery2025

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

Salesloft Drift OAuth supply-chain breach (UNC6395) — mass Salesforce data theft via an AI chat integration2025

Attackers stole OAuth tokens from the Salesloft Drift AI chat integration and used them to silently export Salesforce data from 700+ organisations, reportedly including Cloudflare, Google, Palo Alto Networks and Zscaler.

Anamorpher — image-scaling prompt injection against production AI systems2025

Trail of Bits showed an image that looks benign at full resolution exposes a hidden prompt-injection payload once an AI pipeline downscales it, and used it against Gemini CLI to silently exfiltrate Google Calendar data through an auto-approved Zapier tool call.

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)2026

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

Meta AI support bot tricked into hijacking Instagram accounts2026

Attackers reportedly social-engineered Meta's AI-powered Instagram support chatbot into attaching attacker-controlled emails to target accounts and issuing password-reset codes, taking over high-profile accounts (including the Obama-era White House and a U.S. Space Force CMSgt) without the owner's email or any MFA prompt.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗