Case study

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)

Research demonstration21 Apr 2025🗺️ Multi-Agent System

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Root cause — why it happened

Modern AI 'teams' let a manager AI find and hire specialist helper AIs on the fly. Each helper advertises itself with a little resume — a card that says 'here's what I'm good at'. The manager reads these cards and picks who to delegate to, judging them like a hiring manager reading applications. Trustwave's researchers showed the obvious problem: nobody checks whether the resume is real. They built a fake agent whose card bragged that it could 'do everything really good', and the manager believed it and picked it over the genuine specialist. From then on every job flowed through the fake agent, which could quietly read everything or hand back wrong answers. The lesson: the manager was trusting a self-written brag with no proof of who the agent really was. The fix is to demand a signed, verified ID before any agent can even be considered — so a forged resume never gets read in the first place.

Risks this case illustrates

Rogue & Impersonated Agents Indirect Prompt Injection

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 7

A2A discovery: the planner reads agent cards to pick a peer

In a multi-agent setup using Google's A2A protocol, a manager AI doesn't have its specialists hard-wired in. Instead, each helper agent posts a little card describing what it's good at, and the manager reads the cards to decide who to hand a job to. Normally there's a legitimate specialist — say a currency-converter agent — with an honest card. The manager picks it for currency jobs. So far, everything is working as designed.

⚙️Legitimate agent card (illustrative)config

{
  "name": "CurrencyConverterAgent",
  "description": "Converts amounts between fiat currencies using live rates.",
  "capabilities": ["currency.convert"],
  "url": "https://fx.internal/agent",
  "signature": null            // A2A does NOT mandate signing
}

Step 1 / 7

Controls & guardrails — what would have stopped it

The chain breaks the moment you stop letting a self-written card decide who to trust. If every agent must prove who it is with a signed, registry-checked ID, agents authenticate each other, and only vetted agents are even on the approved list, then the attacker's forged 'do-everything' card never gets read — the manager only judges helpers that are already verified. Judging applicants on their resumes is fine once you've checked their IDs at the door; the bug was letting anyone with a flashy resume walk straight in.

Preventive

Inter-agent authentication & admission control
addressesRogue & Impersonated Agents
Identity proves who an agent is, not that it is behaving honestly — an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.
Per-agent identity & taint-marked messages
Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.
Least-privilege identity & scoped credentials
addressesIndirect Prompt Injection
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
Tool argument validation & sandboxing
Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Detective

Runtime monitoring & anomaly detection
addressesIndirect Prompt Injection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
Full-trace audit logging
addressesIndirect Prompt Injection
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective

Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Rogue & Impersonated Agents →All guardrails for Indirect Prompt Injection →

Lessons

▸ An LLM-as-judge router is a trust anchor in the wrong place: when selection is decided by a model reading attacker-controlled free-text capability descriptions, the most persuasive card wins, not the most trustworthy peer.
▸ Self-asserted, unsigned agent cards are identity claims, not proofs: A2A broadcasts capability metadata without mandatory signing or mutual auth, so the planner trusts WHO it is talking to by convention rather than cryptographic verification.
▸ Discovery without admission control is an impersonation surface: with no allow-list, any broadcast agent joins the candidate set, and a single forged card can be selected and then sit man-in-the-middle on the delegation path for every task.
▸ Winning selection equals interposition: once routed through the rogue, the attacker can passively eavesdrop on task traffic or actively inject false results that re-enter the planner with its authority — a transitive integrity compromise.
▸ Move trust off the judge and onto cryptography + policy: registry-enforced card signing, mutual authentication, and an admission allow-list let the LLM judge rank capability among vetted peers without conferring trust on a stranger.
▸ Bound the peers you do admit: per-agent least-privilege identity and provenance/taint-marked structured inter-agent messages keep even a mis-selected agent's output auditable and its blast radius scoped — signing binds identity, not honesty.

Sources

Agent In the Middle – Abusing Agent Cards in the Agent-2-Agent (A2A) Protocol To 'Win' All the Tasks — Trustwave SpiderLabs (Tom Neaves, Apr 21 2025) ↗
Experts Uncover Critical MCP and A2A Flaws — The Hacker News (Apr 2025) ↗
Agent In the Middle — Abusing Agent Cards in the Agent-2-Agent (A2A) Protocol To 'Win' All the Tasks — Trustwave SpiderLabs (Tom Neaves, Apr 21 2025) (primary) ↗ — Working PoC: a forged 'RogueAgent' card ('do everything really good', injection text in the description) is selected by the host's LLM-as-judge over the legitimate CurrencyConverterAgent, routing all tasks through the attacker for eavesdropping / false-result injection. A2A cards broadcast without mandatory signing or mutual auth.
Experts Uncover Critical MCP and A2A Flaws — The Hacker News (Apr 2025) ↗ — Coverage situating the A2A agent-card abuse alongside related MCP/A2A protocol weaknesses; frames the gap as unauthenticated, implicitly-trusted inter-agent discovery rather than a single product CVE.

Practise the risk class — related scenarios

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server