Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)
Research demonstration21 Apr 2025🗺️ Multi-Agent SystemA red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.
Root cause — why it happened
Modern AI 'teams' let a manager AI find and hire specialist helper AIs on the fly. Each helper advertises itself with a little resume — a card that says 'here's what I'm good at'. The manager reads these cards and picks who to delegate to, judging them like a hiring manager reading applications. Trustwave's researchers showed the obvious problem: nobody checks whether the resume is real. They built a fake agent whose card bragged that it could 'do everything really good', and the manager believed it and picked it over the genuine specialist. From then on every job flowed through the fake agent, which could quietly read everything or hand back wrong answers. The lesson: the manager was trusting a self-written brag with no proof of who the agent really was. The fix is to demand a signed, verified ID before any agent can even be considered — so a forged resume never gets read in the first place.
Risks this case illustrates
Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.
How it unfolded
A2A discovery: the planner reads agent cards to pick a peer
In a multi-agent setup using Google's A2A protocol, a manager AI doesn't have its specialists hard-wired in. Instead, each helper agent posts a little card describing what it's good at, and the manager reads the cards to decide who to hand a job to. Normally there's a legitimate specialist — say a currency-converter agent — with an honest card. The manager picks it for currency jobs. So far, everything is working as designed.
{
"name": "CurrencyConverterAgent",
"description": "Converts amounts between fiat currencies using live rates.",
"capabilities": ["currency.convert"],
"url": "https://fx.internal/agent",
"signature": null // A2A does NOT mandate signing
}Controls & guardrails — what would have stopped it
The chain breaks the moment you stop letting a self-written card decide who to trust. If every agent must prove who it is with a signed, registry-checked ID, agents authenticate each other, and only vetted agents are even on the approved list, then the attacker's forged 'do-everything' card never gets read — the manager only judges helpers that are already verified. Judging applicants on their resumes is fine once you've checked their IDs at the door; the bug was letting anyone with a flashy resume walk straight in.
- Inter-agent authentication & admission controladdressesRogue & Impersonated Agents
Identity proves who an agent is, not that it is behaving honestly — an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.
- Per-agent identity & taint-marked messages
Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.
- Least-privilege identity & scoped credentialsaddressesIndirect Prompt Injection
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
- Tool argument validation & sandboxing
Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.
- Runtime monitoring & anomaly detectionaddressesIndirect Prompt Injection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
- Full-trace audit loggingaddressesIndirect Prompt Injection
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.
- Governance: risk assessment, red-teaming & incident response
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.
Lessons
- ▸ An LLM-as-judge router is a trust anchor in the wrong place: when selection is decided by a model reading attacker-controlled free-text capability descriptions, the most persuasive card wins, not the most trustworthy peer.
- ▸ Self-asserted, unsigned agent cards are identity claims, not proofs: A2A broadcasts capability metadata without mandatory signing or mutual auth, so the planner trusts WHO it is talking to by convention rather than cryptographic verification.
- ▸ Discovery without admission control is an impersonation surface: with no allow-list, any broadcast agent joins the candidate set, and a single forged card can be selected and then sit man-in-the-middle on the delegation path for every task.
- ▸ Winning selection equals interposition: once routed through the rogue, the attacker can passively eavesdrop on task traffic or actively inject false results that re-enter the planner with its authority — a transitive integrity compromise.
- ▸ Move trust off the judge and onto cryptography + policy: registry-enforced card signing, mutual authentication, and an admission allow-list let the LLM judge rank capability among vetted peers without conferring trust on a stranger.
- ▸ Bound the peers you do admit: per-agent least-privilege identity and provenance/taint-marked structured inter-agent messages keep even a mis-selected agent's output auditable and its blast radius scoped — signing binds identity, not honesty.
Sources
- Agent In the Middle – Abusing Agent Cards in the Agent-2-Agent (A2A) Protocol To 'Win' All the Tasks — Trustwave SpiderLabs (Tom Neaves, Apr 21 2025) ↗
- Experts Uncover Critical MCP and A2A Flaws — The Hacker News (Apr 2025) ↗
- Agent In the Middle — Abusing Agent Cards in the Agent-2-Agent (A2A) Protocol To 'Win' All the Tasks — Trustwave SpiderLabs (Tom Neaves, Apr 21 2025) (primary) ↗ — Working PoC: a forged 'RogueAgent' card ('do everything really good', injection text in the description) is selected by the host's LLM-as-judge over the legitimate CurrencyConverterAgent, routing all tasks through the attacker for eavesdropping / false-result injection. A2A cards broadcast without mandatory signing or mutual auth.
- Experts Uncover Critical MCP and A2A Flaws — The Hacker News (Apr 2025) ↗ — Coverage situating the A2A agent-card abuse alongside related MCP/A2A protocol weaknesses; frames the gap as unauthenticated, implicitly-trusted inter-agent discovery rather than a single product CVE.
Practise the risk class — related scenarios
A support email hides instructions — and the assistant obeys them
A poisoned issue makes the agent lie to the human who approves its actions
Compromise the pipeline that builds agents, and every new worker is born malicious
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten
A shopping page tells the agent to do something the user never asked for
A single poisoned document plants a standing instruction that survives every reset
A screenshot that's harmless at full size becomes an order once the system shrinks it
A forged peer registers on the agent directory — and the planner enlists it
The eval gate that was supposed to catch the agent is itself the thing being attacked
A poisoned web page hijacks a research agent — and the planner acts on its behalf
An inbox summary quietly ships a secret to an attacker's server