🔍AI RiskAtlas
← Real-world cases
Case study

IDEsaster — AI coding IDEs/agents turned into exfiltration & RCE surfaces

Disclosed vulnerability06 Dec 2025🗺️ Tool-Using Agent

Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.

Root cause — why it happened

An AI coding assistant works by reading the project you point it at — the code, the README, the config files, even add-on tool packs — and then doing things for you inside the editor: running commands, fetching URLs, writing files. The trouble is it can't tell the difference between a human's instructions and text that an attacker hid inside a file. Plant 'do this' text in a repo, a README, a file name, or a tool's reply, and the assistant reads it as orders. It then uses the editor's own ordinary, trusted features — fetching a URL, saving a settings file, running a command — to leak secrets or run the attacker's code. Marzouk reported that every AI coding tool he tested had at least one way to be tricked like this.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightThe real worldgoal🧑User🎛️Orchestrator /Agent Loop🧠LLM🔐Identity &Permissions🔧Tool RuntimeHuman ApprovalGate🔌External APIs🗄️BusinessDatabase🌐UntrustedContent📝Audit Logging🌐Malicious repo/ file /🌐Attacker server(exfil +
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 7

A developer opens (or clones) a project and asks the agent for help

A developer does the most ordinary thing: they open a project — maybe one they cloned, or a colleague's branch, or an example from the web — and ask their AI assistant to help. 'Set this up', 'explain this repo', 'fix the failing test'. The request is completely innocent. They have no idea that one of the files in the project is booby-trapped.

💬Developer's requestprompt
Hey, can you set up this repo, explain what it does, and get the failing test to pass?
Step 1 / 7

Controls & guardrails — what would have stopped it

No single switch fixes this — it's a whole category — but the chain breaks if the powerful actions are fenced in. Run risky steps in a locked-down sandbox, and always ask a real person before the assistant runs a command or edits the project's settings (showing them exactly what it will do). That stops the 'rewrite the settings to run code' path. Then only let the assistant reach a short list of trusted web addresses, which stops the 'leak secrets through a web fetch' path. Treating the project's files as untrusted and giving the assistant only the access it needs make the trick less likely to land and less damaging when it does.

Preventive
  • Tool argument validation & sandboxing

    Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Egress allowlisting & DLP on tool arguments

    Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

  • Provenance & content signing

    Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • MCP/plugin pinning, manifest hashing & re-review

    Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

Detective
Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • An AI coding agent's 'external content' is the whole project — code, README, manifests, file names, config files, and MCP tool output — and all of it is an injection vector once the agent reads it as instructions.
  • The exploit is the IDE's own trusted features (auto-fetched references, config files, the shell), not a memory-safety bug — operation allowlists won't flag a permitted action used maliciously.
  • Letting the agent write the files that govern its own execution or approval (e.g. workspace settings) turns one injection into code execution; keep that config out of the agent's writable scope or gate every change.
  • Auto-resolved references like a remote JSON $schema URL are silent exfiltration channels; an egress allowlist plus DLP on outbound fetch arguments is the durable fix, not the input filter.
  • MCP tool servers and dependencies are part of the trust boundary: a malicious server's response can inject, and an unpinned dependency can poison — pin and re-review them.
  • This is a category, not a single product flaw: Marzouk reported 100% of the AI IDEs tested were affected by at least one universal chain, so design for the boundary rather than waiting for per-vendor patches.

Practise the risk class — related scenarios

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

👂Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🪟Stealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗