🔍AI RiskAtlas
← Risk taxonomy

Indirect Prompt Injection

criticalInput manipulation
Also known as: cross-domain prompt injection, XPIA

Definition

The attacker doesn't talk to the AI directly — they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.

★ Suggested sub-risk — not yet in your taxonomyrecommended under #38 Prompt injection

This is recommended as a granular sub-risk of #38 Prompt injection (Cyber & Data Security · Technology Risk). #38 is framed as direct prompt manipulation; indirect injection is a distinct delivery vector and the high-severity 'lethal trifecta' case (injection + tools + data). Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.

Where it attaches

The system components this risk arises at.

🌐 Untrusted Content🔍 Retriever📚 Knowledge Store / Vector DB📥 Ingestion Pipeline🔧 Tool Runtime💾 Long-term Memory🤖 Worker Agent🖥️ Computer / Browser Environment Human Approval Gate📝 Audit Logging📝 ASR / Speech-to-Text Model

Detection signals

  • Agent takes actions unrelated to the user's request
  • Outbound tool calls to unexpected destinations after reading content
  • Retrieved chunk contains imperative language ('send', 'ignore', 'forward')
  • Behaviour change correlated with a specific source/document

Controls & guardrails that address this

8

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 5
Delimiting / spotlighting of untrusted contentinteractive

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Detective · 3
Provenance & content signinginteractive

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM01:2025 Prompt Injection
MITRE ATLAS
  • AML.T0051.001 Indirect Prompt Injection
NIST AI RMF
  • MAP 5.1
  • MEASURE 2.7
  • MANAGE 2.4

Real-world cases

20

Actual published events that illustrate this risk — click through for the writeup and sources.

EchoLeak — Microsoft 365 Copilot zero-click (CVE-2025-32711)2025

A crafted email's hidden instructions made M365 Copilot exfiltrate tenant data via an auto-rendered image URL — with no user click.

Indirect prompt injection coined (Greshake et al.)2023

An academic paper showed instructions hidden in a webpage hijacking an LLM-integrated app reading it — coining 'indirect prompt injection'.

Agentic-browser indirect-injection demos (ChatGPT Operator)2025

Researchers showed web-browsing AI agents following instructions embedded in attacker-controlled pages to leak data or take actions.

ChatGPT persistent-memory exfiltration (Rehberger / 'SpAIware')2024

Indirect injection could write attacker instructions into ChatGPT's long-term memory, persisting across chats to exfiltrate data until OpenAI mitigated it.

MCP tool-poisoning PoC (Invariant Labs)2025

Hidden instructions embedded in MCP tool descriptions hijacked agents (e.g. in Cursor) that merely listed the available tools.

Taxonomy of Failure Modes in Agentic AI Systems (Microsoft)2025

Microsoft AI Red Team whitepaper enumerating agentic failure modes, including resource/service exhaustion from runaway loops and fan-out.

ForcedLeak — Salesforce Agentforce CRM exfiltration (CVSS 9.4, no CVE)2025

Researchers showed attacker text planted in a public Salesforce Web-to-Lead form is later read by the Agentforce agent during normal use and treated as instructions, exfiltrating CRM data to an attacker domain that had been on Salesforce's CSP allow-list but expired and was re-registered for about $5.

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery2025

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

ShadowLeak — ChatGPT Deep Research zero-click service-side exfiltration2025

A single crafted email with hidden HTML instructions reportedly made OpenAI's Deep Research agent autonomously exfiltrate Gmail inbox data from OpenAI's own cloud — with no user click and, per Radware, no client-side or network evidence.

IDEsaster — AI coding IDEs/agents turned into exfiltration & RCE surfaces2025

Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.

GitHub Copilot / VS Code RCE via prompt injection ('YOLO mode', CVE-2025-53773)2025

Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)2025

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Agent Session Smuggling in A2A systems (Unit 42)2025

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

Morris II — zero-click self-replicating adversarial-prompt worm across GenAI agents2024

Cohen, Bitton & Nassi (arXiv Mar 2024; ACM CCS 2025) built 'Morris II', the first worm targeting GenAI ecosystems: an adversarial self-replicating prompt that, via RAG-based inference, triggers a zero-click chain of indirect injections forcing each agent to act maliciously and re-infect the next — demonstrated stealing data and spamming through email assistants on ChatGPT, Gemini and LLaVA.

Anamorpher — image-scaling prompt injection against production AI systems2025

Trail of Bits showed an image that looks benign at full resolution exposes a hidden prompt-injection payload once an AI pipeline downscales it, and used it against Gemini CLI to silently exfiltrate Google Calendar data through an auto-approved Zapier tool call.

The Attacker Moves Second — adaptive attacks bypass 12 jailbreak/injection defenses (Nasr, Carlini et al.)2025

Researchers report that adaptive attackers bypass 12 recent jailbreak and prompt-injection defenses with attack success rates above 90% for most, despite those defenses having originally reported near-zero success rates.

MCPTox: tool-poisoning benchmark over real-world MCP servers2025

A benchmark of LLM-agent susceptibility to tool poisoning via malicious tool metadata, built on 45 live MCP servers and 353 real tools; the authors report agents are rarely able to refuse and that more-capable models are often more vulnerable.

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)2026

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

SearchLeak — Microsoft 365 Copilot one-click data theft (CVE-2026-42824)2026

A single malicious link reportedly turned Copilot Enterprise Search's URL query parameter into an executable prompt, exfiltrating emails, MFA codes and files via a Bing image-search side channel.

ChatGPhish — ChatGPT web-summary rendering turned into a phishing surface2026

Attacker-controlled Markdown hidden in a public web page is reportedly rendered by ChatGPT's summarization feature as trusted assistant output — spoofed OpenAI alerts, phishing links, QR codes, and tracking pixels.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗