🔍AI RiskAtlas
← Risk taxonomy

Tool Poisoning / MCP Description Attacks

highAgency & tools
Also known as: malicious tool, heretic tool, rug pull

Definition

Add-on tool packs describe themselves to the AI in plain language — and a sneaky pack can hide commands in that description, or behave nicely until you approve it and then turn malicious.

★ Suggested sub-risk — not yet in your taxonomyrecommended under #42 Tool-layer misuse and unintended actions

This is recommended as a granular sub-risk of #42 Tool-layer misuse and unintended actions (Cyber & Data Security · Technology Risk). Overlaps #38 (descriptions enter the prompt), #42 (tool layer) and #8 (third-party), but none names the tool-registry-as-instruction-channel supply-chain vector. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.

Where it attaches

The system components this risk arises at.

🧰 MCP / Plugin Server🔧 Tool Runtime🤖 Worker Agent🧩 Prompt Assembly

Detection signals

  • Tool description containing imperative side-instructions
  • Manifest/behaviour change after approval
  • Two servers exposing same-named tools
  • Tool sending data to a destination unrelated to its purpose

Controls & guardrails that address this

5

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 4
MCP/plugin pinning, manifest hashing & re-reviewinteractive

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Tool argument validation & sandboxinginteractive

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Detective · 1
Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM03:2025 Supply Chain
  • LLM01:2025 Prompt Injection
MITRE ATLAS
  • AML.T0053 LLM Plugin Compromise
NIST AI RMF
  • MAP 4.1
  • MANAGE 3.1

Real-world cases

7

Actual published events that illustrate this risk — click through for the writeup and sources.

postmark-mcp backdoor2025

A malicious MCP server package was found silently BCC-ing every email it sent to an attacker-controlled address — real supply-chain tool poisoning.

MCP tool-poisoning PoC (Invariant Labs)2025

Hidden instructions embedded in MCP tool descriptions hijacked agents (e.g. in Cursor) that merely listed the available tools.

MCP registry / marketplace poisoning (OX Security)2026

OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.

MCPTox: tool-poisoning benchmark over real-world MCP servers2025

A benchmark of LLM-agent susceptibility to tool poisoning via malicious tool metadata, built on 45 live MCP servers and 353 real tools; the authors report agents are rarely able to refuse and that more-capable models are often more vulnerable.

Malicious JetBrains Marketplace plugins steal AI API keys2026

Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them — ~70,000 installs, with stolen keys allegedly resold to paying users.

codexui-android — malicious npm package steals OpenAI Codex auth tokens2026

A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.

Flowise AI agent builder CustomMCP RCE (CVE-2025-59528)2025

A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗