Anamorpher — image-scaling prompt injection against production AI systems
Research demonstration21 Aug 2025🗺️ Tool-Using AgentTrail of Bits showed an image that looks benign at full resolution exposes a hidden prompt-injection payload once an AI pipeline downscales it, and used it against Gemini CLI to silently exfiltrate Google Calendar data through an auto-approved Zapier tool call.
Root cause — why it happened
Many AI systems shrink the pictures you upload before the model looks at them, to save memory. An attacker made a picture that looks like a harmless image at its full size, but when the system shrinks it, faint patterns line up into readable words — instructions for the AI. The person uploading it never sees those words. The AI reads the shrunken picture, treats the hidden words as a command, and acts on them. In the demo the command told a connected email tool to send the user's calendar to a stranger — and because that tool was set to act without asking, it just did it.
Risks this case illustrates
Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.
How it unfolded
The victim uploads a normal-looking image
The user uploads a picture to their AI assistant — say, asking it to describe or work with the image. At full size it looks completely ordinary; there's nothing visibly wrong with it. The attacker prepared this picture earlier and got it to the victim.
[image: 2048x2048 PNG — appears as an innocuous chart/photo] User: "Here's the schedule image — can you check my calendar and summarise it?" # At full resolution the image carries NO visible text. # The payload is encoded in high-frequency structure that only # resolves after the pipeline downscales the image.
Controls & guardrails — what would have stopped it
The simplest fix that breaks this: don't let the assistant send data anywhere — email, web, anyone — without a person clicking 'yes' first, and show that person what's actually being sent. The image trick still works, but a tricked AI then has no way to quietly ship your data out. Two extra helps: don't silently shrink images, and show the AI a preview of the exact image it's about to read.
- Human-in-the-loop approval on high-risk actions
Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.
- Egress allowlisting & DLP on tool arguments
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
- Least-privilege identity & scoped credentials
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
- MCP/plugin pinning, manifest hashing & re-review
Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.
- Runtime monitoring & anomaly detection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
- Full-trace audit logging
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.
- Loop/cost circuit-breakers & consistency checks
Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.
- Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.
Lessons
- ▸ Prompt injection is multimodal: a payload can ride in pixels that only become legible after the pipeline downscales the image — with no text-channel or Unicode analogue for filters to catch.
- ▸ Silent, lossy preprocessing creates an inspected-vs-delivered gap: the operator reviews the full-resolution upload while the model sees a different, downscaled image. Show the model the image it actually gets.
- ▸ Auto-approving tool calls (trust=True) removes the only enforcement between a manipulated model and a real-world action — the durable control is a human gate on sensitive/irreversible calls, not a better input filter.
- ▸ Detecting the crafted image is hard by design; defend at the action and egress boundaries (confirmation + allowlisting + least privilege) so a successful injection still has no path to exfiltrate.
Proposals & gaps this case surfaced
Non-destructive suggestions for the library — proposed, not adopted.
Before inference, render a preview of the exact image (and dimensions) the model will receive after preprocessing, and either avoid silent downscaling or constrain ingest dimensions — so an attacker cannot hide a payload that only becomes legible after resampling. Closes the inspected-vs-delivered gap that text-based injection filters miss.
This case shows a gap: we usually picture 'hidden instructions' as invisible text, but here they're hidden in an image and only appear after the system shrinks it. There's no guardrail that says 'check that the picture the AI actually sees matches the one the person uploaded' — that divergence is its own risk worth naming.
These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.
Sources
- Weaponizing image scaling against production AI systems — Trail of Bits Blog (Kikimora Morozova & Suha Sabi Hussain, Aug 21 2025) ↗
- trailofbits/anamorpher — open-source PoC for crafting image-scaling attacks against multimodal AI systems (Apache-2.0) ↗
- Weaponizing image scaling against production AI systems — Trail of Bits (Morozova & Hussain, 21 Aug 2025) ↗ — Original research; Gemini CLI + Zapier MCP (trust=True) calendar-exfil PoC; aliasing/Nyquist root cause.
- trailofbits/anamorpher — open-source PoC for crafting image-scaling attacks (Apache-2.0) ↗ — Generates/visualises crafted images for nearest-neighbor, bilinear, and bicubic downscalers.
Practise the risk class — related scenarios
An ops agent gets one god-mode credential — and one misread wipes production
A support email hides instructions — and the assistant obeys them
A text-to-SQL agent runs the model's output straight at the database
A poisoned issue makes the agent lie to the human who approves its actions
A speed optimisation becomes a cross-tenant listening device
Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten
A shopping page tells the agent to do something the user never asked for
A single poisoned document plants a standing instruction that survives every reset
A screenshot that's harmless at full size becomes an order once the system shrinks it
An attacker captures the agent's bearer token — and inherits its authority
A forged peer registers on the agent directory — and the planner enlists it
The eval gate that was supposed to catch the agent is itself the thing being attacked
A poisoned web page hijacks a research agent — and the planner acts on its behalf
An inbox summary quietly ships a secret to an attacker's server