MCPTox: tool-poisoning benchmark over real-world MCP servers

Research demonstration19 Aug 2025

MCPTox, described by its authors as the first benchmark to systematically measure agent robustness against tool poisoning in realistic Model Context Protocol (MCP) settings, is constructed over 45 live, real-world MCP servers exposing 353 authentic tools. The authors embed adversarial instructions in tool metadata (notably the natural-language tool description ingested at registration), generating 1,312 illustrative malicious test cases across 10 risk categories using three attack templates. According to the paper, many of 20 evaluated LLM agents can be steered into malicious actions while using otherwise legitimate tools, with reported attack success rates up to roughly 72% (o1-mini at 72.8%). The authors report that agents rarely refuse these attacks — the highest refusal rate, for Claude-3.7-Sonnet, is reportedly under 3% — and that more-capable models are often more susceptible because the attack exploits their stronger instruction-following. This extends the earlier single-PoC demonstrations (e.g. Invariant Labs' MCP tool-poisoning notification) and in-the-wild cases (the postmark-mcp backdoor) into a quantified, ecosystem-scale picture, with the policy-relevant implication that capability can scale this vulnerability rather than mitigate it. Figures are as reported by the authors; payload details are illustrative, not operational.

Risks it illustrates

Tool Poisoning / MCP Description Attacks Supply-Chain Compromise Indirect Prompt Injection Tool Misuse

Sources

MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers (arXiv:2508.14925) ↗

Practise the risk class — related scenarios

Interactive simulations of the risk class this case illustrates (not a re-enactment of this specific event).

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🔓The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

💤The Sleeper

A capable third-party model that behaves perfectly — until it sees the trigger

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

More cases on Tool Poisoning / MCP Description Attacks

postmark-mcp backdoor MCP tool-poisoning PoC (Invariant Labs)MCP registry / marketplace poisoning (OX Security)Malicious JetBrains Marketplace plugins steal AI API keys codexui-android — malicious npm package steals OpenAI Codex auth tokens Flowise AI agent builder CustomMCP RCE (CVE-2025-59528)