๐Ÿ”AI RiskAtlas
โ† Real-world cases

MCPTox: tool-poisoning benchmark over real-world MCP servers

Research demonstration19 Aug 2025

MCPTox, described by its authors as the first benchmark to systematically measure agent robustness against tool poisoning in realistic Model Context Protocol (MCP) settings, is constructed over 45 live, real-world MCP servers exposing 353 authentic tools. The authors embed adversarial instructions in tool metadata (notably the natural-language tool description ingested at registration), generating 1,312 illustrative malicious test cases across 10 risk categories using three attack templates. According to the paper, many of 20 evaluated LLM agents can be steered into malicious actions while using otherwise legitimate tools, with reported attack success rates up to roughly 72% (o1-mini at 72.8%). The authors report that agents rarely refuse these attacks โ€” the highest refusal rate, for Claude-3.7-Sonnet, is reportedly under 3% โ€” and that more-capable models are often more susceptible because the attack exploits their stronger instruction-following. This extends the earlier single-PoC demonstrations (e.g. Invariant Labs' MCP tool-poisoning notification) and in-the-wild cases (the postmark-mcp backdoor) into a quantified, ecosystem-scale picture, with the policy-relevant implication that capability can scale this vulnerability rather than mitigate it. Figures are as reported by the authors; payload details are illustrative, not operational.

Practise the risk class โ€” related scenarios

Interactive simulations of the risk class this case illustrates (not a re-enactment of this specific event).

๐Ÿ”‘The Agent With the Master Key

An ops agent gets one god-mode credential โ€” and one misread wipes production

๐Ÿ“งThe Email That Gave Orders

A support email hides instructions โ€” and the assistant obeys them

๐Ÿ—„๏ธWhen the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

๐Ÿ•ต๏ธLies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

๐ŸญPoisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

๐ŸชคThe Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

๐Ÿ“ผThe Compromised Flight Recorder

The forensic record is itself the attack surface โ€” an agent's log is poisoned, then quietly rewritten

๐Ÿ‘๏ธThe Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

๐Ÿง The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

๐Ÿ”“The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

๐Ÿ–ผ๏ธThe Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

๐Ÿ’คThe Sleeper

A capable third-party model that behaves perfectly โ€” until it sees the trigger

๐Ÿ”ŒThe Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

๐Ÿ›ก๏ธThe Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

๐ŸชชThe Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent โ€” and the planner acts on its behalf

๐Ÿ–ผ๏ธZero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

More cases on Tool Poisoning / MCP Description Attacks

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—