πŸ”AI RiskAtlas
← Real-world cases
Case study

Agentic-browser indirect-injection demos (ChatGPT Operator)

Research demonstration17 Feb 2025πŸ—ΊοΈ Computer-Use Agent

Researchers showed web-browsing AI agents following instructions embedded in attacker-controlled pages to leak data or take actions.

Root cause β€” why it happened

These agents browse the web for you β€” they look at a page on screen and then click and type to get your task done. Researchers showed that if the agent visits a page the attacker controls, the page can contain text written as orders for the AI, not for a person. The agent reads the screen, can't tell the difference between 'the task you asked for' and 'words on the page', and follows the planted orders β€” pasting your private details into the attacker's site or doing things in accounts you're already logged into. Nothing about it required you to click anything malicious; you just asked the agent to look at a page.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightControlled computer + untrusted webgoalπŸ§‘UserπŸŽ›οΈOrchestrator /Agent Loop🧠Vision-LanguageModelπŸ”Identity &PermissionsπŸ”§Action Executorβœ‹Human ApprovalGateπŸ–₯️Computer /Browser🌐UntrustedContentπŸ“Audit Logging🌐Attacker-controlledweb page🌐Attackercollection
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect its risks
SetupStep 1 / 6

A normal browsing task

You hand the agent an everyday job: 'go to this site and do something for me.' It opens a real browser and starts working, already signed in to the accounts you use β€” email, a shopping site, whatever your session covers.

πŸ’¬User's goalprompt
Operator, open this product-research page I found and pull together a quick summary of what people are saying, then add the cheapest option to my cart.
Step 1 / 6

Controls & guardrails β€” what would have stopped it

The fix that actually closes this: don't let the agent send your data to, or act on, places that weren't part of the task. If it can only reach a short list of trusted sites, runs in a throwaway profile that isn't logged into your real accounts, and has to ask a person before anything irreversible, then even a tricked agent has nowhere to send your details and nothing it can quietly do in your name. Catching the planted text on the page helps, but it's the leash on what the agent can reach that's load-bearing.

Preventive
  • Egress allowlisting & DLP on tool arguments

    Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation β€” only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Detective
Corrective
  • Loop/cost circuit-breakers & consistency checks

    Thresholds are blunt β€” too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • β–Έ Once an agent reaches real effectors, indirect injection stops being a text problem: a hijack inside a logged-in session becomes data exfiltration or an unintended action.
  • β–Έ On-screen page content is an input channel β€” rendered attacker text (even faint or off-screen) is read with the same trust as the user's goal; treat it as untrusted, taint-marked data.
  • β–Έ The durable control is egress destination allow-listing plus least-privilege session scoping, not detecting the injection: constrain where the agent can send data and what it can act on.
  • β–Έ Confirmation prompts lower probability but can be socially engineered or skipped for low-friction steps; reserve human-in-the-loop for genuinely irreversible actions and don't rely on it as the boundary.
  • β–Έ Log the materialised perception+action stream (screenshots seen, navigations, keystrokes), because GUI-driven harm is invisible in tidy API logs.

Practise the risk class β€” related scenarios

πŸ”‘The Agent With the Master Key

An ops agent gets one god-mode credential β€” and one misread wipes production

πŸ“£The Echo Chamber

A team of agents agrees its way into a confidently wrong answer β€” and a runaway loop

πŸ“§The Email That Gave Orders

A support email hides instructions β€” and the assistant obeys them

πŸ—„οΈWhen the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

πŸͺ‘Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps β€” and per-step filters never see the attack

πŸ•΅οΈLies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

πŸ‘‚Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

πŸͺŸStealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage β€” with no attacker in sight

πŸͺ€The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

πŸ“ΌThe Compromised Flight Recorder

The forensic record is itself the attack surface β€” an agent's log is poisoned, then quietly rewritten

πŸ‘οΈThe Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

πŸ–ΌοΈThe Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token β€” and inherits its authority

πŸ₯ΈThe Uninvited Agent

A forged peer registers on the agent directory β€” and the planner enlists it

πŸ›‘οΈThe Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

πŸͺͺThe Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent β€” and the planner acts on its behalf

πŸ–ΌοΈZero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—