🔍AI RiskAtlas
← Real-world cases
Case study

GitHub Copilot / VS Code RCE via prompt injection ('YOLO mode', CVE-2025-53773)

Disclosed vulnerability12 Aug 2025🗺️ Tool-Using Agent

Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).

Root cause — why it happened

GitHub Copilot's coding agent in VS Code can read your project files and also DO things on your computer — run shell commands, edit files, browse the web. Normally, before it runs a command, it asks you to click 'approve'. An attacker hid instructions inside ordinary content the agent reads — a source file, a web page, or a GitHub issue. When the agent read that content, it followed the hidden instructions and quietly edited the project's own settings file to turn on an 'auto-approve' mode (nicknamed 'YOLO mode'). With approval switched off, the agent could then run any command on the machine without ever asking — so a hidden message in a file turned into the attacker running code on the developer's computer.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightThe real worldgoalcontext🧑User🎛️Orchestrator /Agent Loop🧠LLM🔐Identity &Permissions🔧Tool RuntimeHuman ApprovalGate🔌External APIs🗄️BusinessDatabase🌐UntrustedContent📝Audit Logging🌐Injectedcontent (repo🔌.vscode/settings.json(agent's own
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 7

A developer opens a project and uses the Copilot agent

A developer opens a project in VS Code and asks the Copilot agent to help — fix a bug, summarise a file, follow up on a GitHub issue. Nothing about the request is unusual. But somewhere in the content the agent will read — a source file, a web page it browses, or a GitHub issue — an attacker has hidden instructions written for the AI, not for a person.

💬Developer's requestprompt
@workspace can you look at the open issue, figure out why the build is failing, and fix it?
Step 1 / 7

Controls & guardrails — what would have stopped it

The fix that actually closes this: never let the AI run risky commands without a real human saying yes — and make sure the AI can't turn that 'ask first' setting off by itself. If the approval step lives somewhere the agent can't quietly change, then even a tricked agent has to stop and ask, and the developer would see the strange command before it runs. Putting the agent in a sandbox limits the damage if something still gets through.

Preventive
  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • Tool argument validation & sandboxing

    Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

  • Delimiting / spotlighting of untrusted content

    A trained convention, not enforcement. Determined payloads still break out, especially when content is long or the attack is novel. Combine with action-layer controls.

Detective
Corrective
  • Loop/cost circuit-breakers & consistency checks

    Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • An auto-run ('YOLO') mode that removes the approval gate converts a successful prompt injection into code execution — the gate is the whole safety story for an agent with a shell.
  • Never let an agent rewrite the configuration that governs its own permissions: the approval policy must be out-of-band and tamper-resistant from the agent's output.
  • Treat everything the coding agent ingests as untrusted instructions — source files, fetched web pages, GitHub issues, tool responses, even invisible Unicode can carry the payload.
  • Keep an unconditional human approval gate on irreversible/exec actions, and sandbox the agent so an approved command has no host-level reach.
  • Injection in a committed file is wormable: a payload pushed upstream re-triggers for the next developer who opens the project — review and scope what the agent can write.

Practise the risk class — related scenarios

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📣The Echo Chamber

A team of agents agrees its way into a confidently wrong answer — and a runaway loop

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪡Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage — with no attacker in sight

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗