IDEsaster — AI coding IDEs/agents turned into exfiltration & RCE surfaces
Disclosed vulnerability06 Dec 2025🗺️ Tool-Using AgentResearcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.
Root cause — why it happened
An AI coding assistant works by reading the project you point it at — the code, the README, the config files, even add-on tool packs — and then doing things for you inside the editor: running commands, fetching URLs, writing files. The trouble is it can't tell the difference between a human's instructions and text that an attacker hid inside a file. Plant 'do this' text in a repo, a README, a file name, or a tool's reply, and the assistant reads it as orders. It then uses the editor's own ordinary, trusted features — fetching a URL, saving a settings file, running a command — to leak secrets or run the attacker's code. Marzouk reported that every AI coding tool he tested had at least one way to be tricked like this.
Risks this case illustrates
Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.
How it unfolded
A developer opens (or clones) a project and asks the agent for help
A developer does the most ordinary thing: they open a project — maybe one they cloned, or a colleague's branch, or an example from the web — and ask their AI assistant to help. 'Set this up', 'explain this repo', 'fix the failing test'. The request is completely innocent. They have no idea that one of the files in the project is booby-trapped.
Hey, can you set up this repo, explain what it does, and get the failing test to pass?
Controls & guardrails — what would have stopped it
No single switch fixes this — it's a whole category — but the chain breaks if the powerful actions are fenced in. Run risky steps in a locked-down sandbox, and always ask a real person before the assistant runs a command or edits the project's settings (showing them exactly what it will do). That stops the 'rewrite the settings to run code' path. Then only let the assistant reach a short list of trusted web addresses, which stops the 'leak secrets through a web fetch' path. Treating the project's files as untrusted and giving the assistant only the access it needs make the trick less likely to land and less damaging when it does.
- Tool argument validation & sandboxing
Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.
- Human-in-the-loop approval on high-risk actions
Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.
- Egress allowlisting & DLP on tool arguments
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
- Provenance & content signingaddressesIndirect Prompt Injection
Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.
- Least-privilege identity & scoped credentials
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
- MCP/plugin pinning, manifest hashing & re-review
Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.
- Runtime monitoring & anomaly detection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
- Full-trace audit logging
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.
- Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.
Lessons
- ▸ An AI coding agent's 'external content' is the whole project — code, README, manifests, file names, config files, and MCP tool output — and all of it is an injection vector once the agent reads it as instructions.
- ▸ The exploit is the IDE's own trusted features (auto-fetched references, config files, the shell), not a memory-safety bug — operation allowlists won't flag a permitted action used maliciously.
- ▸ Letting the agent write the files that govern its own execution or approval (e.g. workspace settings) turns one injection into code execution; keep that config out of the agent's writable scope or gate every change.
- ▸ Auto-resolved references like a remote JSON $schema URL are silent exfiltration channels; an egress allowlist plus DLP on outbound fetch arguments is the durable fix, not the input filter.
- ▸ MCP tool servers and dependencies are part of the trust boundary: a malicious server's response can inject, and an unpinned dependency can poison — pin and re-review them.
- ▸ This is a category, not a single product flaw: Marzouk reported 100% of the AI IDEs tested were affected by at least one universal chain, so design for the boundary rather than waiting for per-vendor patches.
Sources
- IDEsaster: A Novel Vulnerability Class in AI IDEs — Ari 'MaccariTA' Marzouk (primary disclosure, Dec 6 2025) ↗
- Researcher Uncovers 30+ Flaws in AI Coding Tools Enabling Data Theft and RCE Attacks — The Hacker News ↗
- Critical AI IDE flaws dubbed an 'IDEsaster' — Tom's Hardware ↗
- How AI agents can weaponize IDEs — ReversingLabs Blog ↗
- IDEsaster: A Novel Vulnerability Class in AI IDEs — Ari 'MaccariTA' Marzouk (primary disclosure) ↗ — Six-month study; 30+ vulns, 24 CVEs, 10-plus products; 100% of tested AI IDEs affected by at least one chain.
- Researcher Uncovers 30+ Flaws in AI Coding Tools Enabling Data Theft and RCE — The Hacker News ↗ — Coverage of the chains: remote $schema fetch, workspace-config overwrite, MCP/file-name injection.
- How AI agents can weaponize IDEs — ReversingLabs ↗ — Analysis of legitimate IDE features as the attack surface.
Practise the risk class — related scenarios
An ops agent gets one god-mode credential — and one misread wipes production
A support email hides instructions — and the assistant obeys them
A text-to-SQL agent runs the model's output straight at the database
A poisoned issue makes the agent lie to the human who approves its actions
A speed optimisation becomes a cross-tenant listening device
Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten
A shopping page tells the agent to do something the user never asked for
A single poisoned document plants a standing instruction that survives every reset
A screenshot that's harmless at full size becomes an order once the system shrinks it
An attacker captures the agent's bearer token — and inherits its authority
A forged peer registers on the agent directory — and the planner enlists it
The eval gate that was supposed to catch the agent is itself the thing being attacked
A poisoned web page hijacks a research agent — and the planner acts on its behalf
An inbox summary quietly ships a secret to an attacker's server