🔍AI RiskAtlas
← Real-world cases
Case study

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)

Research demonstration12 Jun 2026🗺️ Tool-Using Agent

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

Root cause — why it happened

AI coding assistants can read your error-tracking tool (Sentry) through a plug-in connector to help you fix bugs. Researchers found that anyone could file a fake 'crash report' into a company's Sentry — they only needed a public key that companies put right in their website's code. The fake report was written to look exactly like a real Sentry error, but hidden inside it were instructions for the AI. When a developer later asked their assistant to look through the open issues, the assistant read the fake report, couldn't tell it apart from a genuine one, and obeyed the hidden instructions — running the attacker's commands on the developer's own computer, with the developer's own access.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightThe real worldPOSTs fake error event (DSN write, no auth)🧑User🎛️Orchestrator /Agent Loop🧠LLM🔐Identity &Permissions🔧Tool RuntimeHuman ApprovalGate🔌External APIs🗄️BusinessDatabase🌐UntrustedContent📝Audit Logging🌐Attacker (holdsonly a public
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 6

An attacker plants a fake error using only a public key

Sentry catches crashes in apps, and to do that it accepts crash reports sent with a special public key — the DSN — that companies put right in their website's code. An attacker finds one of these keys (they're easy to search for) and sends in their own fake crash report. No password, no login. The fake report waits in the company's Sentry like any other error.

💻Crafted Sentry event (illustrative)code
POST https://o<org>.ingest.sentry.io/api/<project>/store/
X-Sentry-Auth: Sentry sentry_key=<PUBLIC_DSN_KEY>   # write-only, no further auth

{
  "message": "TypeError: Cannot read property 'id' of undefined",
  "contexts": {
    "<key-name-mimicking-the-MCP-template>": "...markdown crafted to read as a trusted Sentry diagnostic + instruction..."
  }
}
# DSN is public + write-only by design → anyone who finds it can POST events.
Step 1 / 6

Controls & guardrails — what would have stopped it

The strongest fix is at the developer's end, not Sentry's: treat anything that comes back from a tool as untrusted, and never let the assistant run commands straight from it. If running a shell command always paused for the developer to approve — showing the exact command and where it came from — then a fake bug report could suggest a command, but it couldn't actually run one. Giving the assistant only the access it truly needs also limits how much damage a tricked agent can do.

Preventive
Detective
Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • Trusted integration, untrusted data: an MCP server can be entirely legitimate while the data it returns is attacker-controlled — because the upstream service (Sentry) accepts writes from third parties via a public DSN.
  • Public, write-only credentials are still an attack surface: a DSN that's meant to be embedded in frontend code lets anyone POST events that an AI agent will later read and trust.
  • Tool-response data must be treated as untrusted input, not passive context — the core MCP assumption that breaks once an external service ingests third-party writes.
  • Mimicking the tool's own trusted template defeats spotlighting/delimiting: the durable control is an action-layer gate (HITL + sandbox + least-privilege) on commands derived from tool output, not a visual data/instruction convention.
  • A vendor declining a root-cause fix ('not defensible' upstream) pushes containment onto the agent operator: assume any third-party-writable integration can carry injection, and gate execution accordingly.

Proposals & gaps this case surfaced

Non-destructive suggestions for the library — proposed, not adopted.

★ proposed sub-riskMCP/integration data-channel injection (third-party-writable tool responses)under #42

An indirect prompt injection delivered through the tool-response data of a legitimate, trusted integration (e.g. an MCP server) whose upstream service accepts writes from parties other than the legitimate application — such as open event ingestion authenticated only by a public, write-only key (a Sentry DSN). Because the integration itself is benign and vetted, its returned data is treated as trusted context; an attacker who can write to the upstream store thereby injects instructions the agent obeys with its own privileges.

✚ proposed guardrailClassify each tool/MCP integration's data channel by who can write to it; taint-gate tool-response data from any third-party-writable source so it cannot drive actions without a provenance-aware approval gateAgent Access & Tool Control

When onboarding an MCP/tool integration, do not stop at vetting the tool's code/manifest — also classify whether an unauthenticated or external party can write the data the tool returns (open ingestion, public write keys like a Sentry DSN, shared inboxes/issue trackers). Treat tool-response data from any third-party-writable source as untrusted ingress: taint-mark it and require a provenance-aware HITL gate (showing the exact action and its originating tool response) before any command/tool call derived from it executes. Closes the agentjacking vector where a trusted integration's legitimate data channel carries attacker-written instructions; pairs with least-privilege session scope and sandboxed execution without ambient credentials.

This case shows a gap people miss: we vet whether a tool or plug-in is trustworthy, but we rarely ask 'can outsiders write data into the source this tool reads?' Here the tool (Sentry's connector) was completely legitimate — the danger was that anyone could file a fake report into Sentry for the AI to later read. We should treat 'is this integration's data third-party-writable?' as its own thing to check.

These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.

Practise the risk class — related scenarios

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📣The Echo Chamber

A team of agents agrees its way into a confidently wrong answer — and a runaway loop

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪡Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage — with no attacker in sight

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗