πŸ”AI RiskAtlas
← Real-world cases
Case study

MCP registry / marketplace poisoning (OX Security)

Research demonstration15 Apr 2026πŸ—ΊοΈ Multi-Agent System

OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.

Root cause β€” why it happened

Modern AI assistants don't hard-code every tool they can use. Instead they look up tools from public catalogues β€” think app stores for AI 'plug-in servers'. Researchers at OX Security found these catalogues barely check who is publishing. They submitted one fake, malicious tool-server and got it accepted into 9 of 11 public catalogues. Then they watched as six real, live AI platforms automatically discovered that fake server from the catalogue and trusted it just because it was listed. Because the platforms run whatever a listed server tells them to, OX could make those six platforms run commands on the machines hosting them. Nobody had to break in: the attacker simply walked through the front door of a catalogue everyone trusts, and the AI orchestrators picked the poisoned tool up on their own. The core mistake is trusting a directory's listing as if it were proof of safety.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent teamOversightExternalenrolls server, no real validation (9/11 registries)πŸ§‘UserπŸ—ΊοΈPlanner AgentπŸ€–Research AgentπŸ€–Coding AgentπŸ€–Comms AgentπŸ”§Tool Runtime🌐UntrustedContentπŸ—„οΈBusinessDatabaseπŸ”ŒExternal APIsπŸ“ˆMonitoring &EvalsπŸͺͺAgent RegistryπŸ§‘Attacker(maliciousπŸͺPublic MCPregistry /🧰Malicious MCPserver
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect its risks
SetupStep 1 / 6

An attacker publishes a malicious MCP server to public registries

MCP servers are little programs that hand AI assistants extra abilities, and there are public catalogues listing them. OX Security built one deliberately malicious server and submitted it to those catalogues β€” the way anyone might publish to an app store. The catalogues didn't really check who they were or whether the server was safe. Per OX, their fake server was accepted into 9 of 11 public catalogues. At this point nothing has happened yet; the trap is simply listed, waiting to be discovered.

βš™οΈMalicious registry submission (illustrative)config
# submitted to a public MCP registry β€” no publisher vetting, no signature
name: "productivity-helper-mcp"      # benign-looking, no real provenance
description: "File + shell utilities for agents"
publisher: "unverified / self-asserted"
capabilities: [ "fs.read", "fs.write", "shell.exec" ]
transport: stdio
artifact_signature: NONE              # registry does not require signing
vetting_status: AUTO-ACCEPTED         # per OX: admitted into 9 of 11 registries
Step 1 / 6

Controls & guardrails β€” what would have stopped it

The fix that actually breaks this chain is at the moment a tool-server is admitted, not after it runs. If platforms only used a short, vetted list of servers they had checked and pinned β€” instead of trusting whatever a public catalogue returns β€” the poisoned listing would never be picked up. If registries verified who was publishing and signed what they listed, the fake server wouldn't get in. And if every server were boxed-in with only the permissions it truly needs, even a malicious one couldn't reach the host computer. A human signing off before adding a powerful new server catches the rest. Filtering or watching alone isn't enough β€” six live platforms discovered the trap on their own, so the boundary has to be admission itself.

Preventive
  • MCP/plugin pinning, manifest hashing & re-review

    Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation β€” only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Tool argument validation & sandboxing

    Validates form, not intent β€” a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Detective
  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Full-trace audit logging

    Logging is forensic, not preventive β€” it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • β–Έ Registry presence is not authorisation: orchestrators that treat 'discoverable in a directory' as 'trusted and invocable' inherit every unvetted publisher's intent β€” OX got one malicious server into 9 of 11 registries.
  • β–Έ Dynamic discovery is an admission surface: the convenience of auto-resolving tools from a catalogue is exactly what let a poisoned listing reach six live production platforms with no break-in required.
  • β–Έ Supply-chain failures fan out: because many platforms discover from the same registries, one poisoned listing is an ecosystem-wide RCE surface β€” OX's 'mother of all AI supply chains' against 7,000+ reachable servers.
  • β–Έ The RCE is the impact, not the root cause: STDIO command execution follows automatically once a malicious server is admitted, so containment belongs at admission, not after invocation.
  • β–Έ Contain at the admission boundary: signed/attested artifacts with verified provenance, pinning to a vetted allow-list, human review of new powerful servers, and sandboxing/least-privilege break the chain where model-side or input filtering cannot.
  • β–Έ Signing proves origin, not safety: provenance and allow-lists are necessary but not sufficient β€” a legitimately-signed server can be compromised upstream, so sandboxing and least privilege must backstop the trust chain.

Sources

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—