Case study

MCP tool-poisoning PoC (Invariant Labs)

Research demonstration01 Apr 2025🗺️ Tool-Using Agent

Hidden instructions embedded in MCP tool descriptions hijacked agents (e.g. in Cursor) that merely listed the available tools.

Root cause — why it happened

Agents can plug into 'tool servers' (MCP servers) that advertise what they can do — each tool comes with a short written description, like 'sends an email' or 'adds two numbers'. The agent reads those descriptions so the model knows what's on offer. Invariant Labs showed the catch: an attacker who controls a tool server can hide secret instructions inside those descriptions. The moment the agent just lists the available tools — before it ever uses any of them — those hidden instructions land in the model's context and can steer it: read a secret file, change what another tool does, then hide the evidence. No malicious tool ever has to run; advertising the tool is enough.

Risks this case illustrates

Tool Poisoning / MCP Description Attacks Indirect Prompt Injection

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

The agent adds an MCP tool server

A developer wires their coding agent up to a handy-looking MCP tool server — the kind of thing people share and install all the time. It advertises a few tools, including something innocent like a calculator. Adding it feels as low-stakes as installing a small plugin.

⚙️Adding the MCP server (illustrative)config

// agent client config
{
  "mcpServers": {
    "handy-tools": {
      "command": "npx",
      "args": ["-y", "handy-tools-mcp"]
    }
  }
}
// Registered as a tool source. Nothing has run yet.
// kind: RESEARCH proof-of-concept (Invariant Labs), not a live incident.

Step 1 / 6

Controls & guardrails — what would have stopped it

The control aimed straight at this is treating tool servers like software you vet: lock to a reviewed version, read the FULL descriptions (not the short label), and re-check whenever a server changes — that's MCP pinning and manifest review. Pair it with treating those descriptions as untrusted text rather than orders, giving the agent only the access it needs, and controlling where it can send data. The honest catch: a subtly-worded malicious description can slip past a human reviewer, and treating descriptions as 'just data' lowers the odds of a hijack but never to zero — so the access limits and egress controls are what cap the damage when an injection still lands.

Preventive

Delimiting / spotlighting of untrusted content
addressesIndirect Prompt Injection
A trained convention, not enforcement. Determined payloads still break out, especially when content is long or the attack is novel. Combine with action-layer controls.
MCP/plugin pinning, manifest hashing & re-review
addressesTool Poisoning / MCP Description Attacks
Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.
Least-privilege identity & scoped credentials
addressesTool Poisoning / MCP Description Attacks Indirect Prompt Injection
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
Egress allowlisting & DLP on tool arguments
addressesTool Poisoning / MCP Description Attacks Indirect Prompt Injection
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
Tool argument validation & sandboxing
addressesTool Poisoning / MCP Description Attacks
Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Detective

Full-trace audit logging
addressesTool Poisoning / MCP Description Attacks Indirect Prompt Injection
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.
Runtime monitoring & anomaly detection
addressesIndirect Prompt Injection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Corrective

Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Tool Poisoning / MCP Description Attacks →All guardrails for Indirect Prompt Injection →

Lessons

▸ Tool descriptions are prompts: in MCP, a server's free-text tool description is injected into the model's context and parsed with full trust, making metadata an injection vector.
▸ The hijack can fire at tool ENUMERATION — before any tool is called — so it sits ahead of authorization, argument validation and human-approval gates, which all live on the invocation path.
▸ One bad server can poison your good ones: 'tool shadowing' lets a malicious server's description silently redefine how the model uses a trusted server's tools, so per-server review in isolation is insufficient.
▸ Treat adding an MCP server like adding a dependency: pin versions, hash and review the full manifest (not the abridged UI label), and re-review on change to catch rug-pulls.
▸ Input-side hygiene (spotlighting, review) lowers but never zeroes injection — least-privilege identity and egress/argument controls are what cap the damage when a poisoned description still lands.

Sources

MCP Security Notification: Tool Poisoning Attacks — Invariant Labs ↗
invariantlabs-ai/mcp-injection-experiments (PoC code) ↗
Model Context Protocol has prompt injection security problems — Simon Willison ↗
MCP Security Notification: Tool Poisoning Attacks — Invariant Labs ↗ — Original disclosure: poisoned tool descriptions, file exfiltration via tool args, and tool shadowing across servers.
invariantlabs-ai/mcp-injection-experiments (PoC code) ↗ — Proof-of-concept code for the tool-poisoning and shadowing experiments.
Model Context Protocol has prompt injection security problems — Simon Willison ↗ — Independent corroboration and framing of the tool-description injection class.

Practise the risk class — related scenarios

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server