Case study

Agentic-browser indirect-injection demos (ChatGPT Operator)

Research demonstration17 Feb 2025🗺️ Computer-Use Agent

Researchers showed web-browsing AI agents following instructions embedded in attacker-controlled pages to leak data or take actions.

Root cause — why it happened

These agents browse the web for you — they look at a page on screen and then click and type to get your task done. Researchers showed that if the agent visits a page the attacker controls, the page can contain text written as orders for the AI, not for a person. The agent reads the screen, can't tell the difference between 'the task you asked for' and 'words on the page', and follows the planted orders — pasting your private details into the attacker's site or doing things in accounts you're already logged into. Nothing about it required you to click anything malicious; you just asked the agent to look at a page.

Risks this case illustrates

Indirect Prompt Injection Sensitive Data Leakage Excessive Agency

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

A normal browsing task

You hand the agent an everyday job: 'go to this site and do something for me.' It opens a real browser and starts working, already signed in to the accounts you use — email, a shopping site, whatever your session covers.

💬User's goalprompt

Operator, open this product-research page I found and pull together a quick summary of what people are saying, then add the cheapest option to my cart.

Step 1 / 6

Controls & guardrails — what would have stopped it

The fix that actually closes this: don't let the agent send your data to, or act on, places that weren't part of the task. If it can only reach a short list of trusted sites, runs in a throwaway profile that isn't logged into your real accounts, and has to ask a person before anything irreversible, then even a tricked agent has nowhere to send your details and nothing it can quietly do in your name. Catching the planted text on the page helps, but it's the leash on what the agent can reach that's load-bearing.

Preventive

Egress allowlisting & DLP on tool arguments
addressesIndirect Prompt Injection Sensitive Data Leakage
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
Least-privilege identity & scoped credentials
addressesIndirect Prompt Injection Sensitive Data Leakage Excessive Agency
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
Human-in-the-loop approval on high-risk actions
addressesIndirect Prompt Injection Excessive Agency
Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.
Per-agent identity & taint-marked messages
addressesExcessive Agency
Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Detective

Runtime monitoring & anomaly detection
addressesIndirect Prompt Injection Sensitive Data Leakage Excessive Agency
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
Full-trace audit logging
addressesIndirect Prompt Injection Sensitive Data Leakage Excessive Agency
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective

Loop/cost circuit-breakers & consistency checks
addressesExcessive Agency
Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.
Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Indirect Prompt Injection →All guardrails for Sensitive Data Leakage →All guardrails for Excessive Agency →

Lessons

▸ Once an agent reaches real effectors, indirect injection stops being a text problem: a hijack inside a logged-in session becomes data exfiltration or an unintended action.
▸ On-screen page content is an input channel — rendered attacker text (even faint or off-screen) is read with the same trust as the user's goal; treat it as untrusted, taint-marked data.
▸ The durable control is egress destination allow-listing plus least-privilege session scoping, not detecting the injection: constrain where the agent can send data and what it can act on.
▸ Confirmation prompts lower probability but can be socially engineered or skipped for low-friction steps; reserve human-in-the-loop for genuinely irreversible actions and don't rely on it as the boundary.
▸ Log the materialised perception+action stream (screenshots seen, navigations, keystrokes), because GUI-driven harm is invisible in tidy API logs.

Sources

ChatGPT Operator: Prompt Injection Exploits and Defenses — Embrace The Red (Johann Rehberger) ↗
ChatGPT Operator: Prompt Injection Exploits & Defenses — Simon Willison's Weblog ↗
ChatGPT Operator: Prompt Injection Exploits and Defenses — Embrace The Red (Johann Rehberger) ↗ — Original research demonstrating planted-page instructions driving Operator's session actions; behaviour described as reportedly demonstrated.
ChatGPT Operator: Prompt Injection Exploits & Defenses — Simon Willison's Weblog (Feb 2025) ↗ — Write-up summarising the demos and the indirect-injection-reaches-effectors framing.

Practise the risk class — related scenarios

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📣The Echo Chamber

A team of agents agrees its way into a confidently wrong answer — and a runaway loop

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪡Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

👂Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🪟Stealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage — with no attacker in sight

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server