🔍AI RiskAtlas
← Real-world cases
Case study

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)

Research demonstration21 Apr 2025🗺️ Multi-Agent System

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Root cause — why it happened

Modern AI 'teams' let a manager AI find and hire specialist helper AIs on the fly. Each helper advertises itself with a little resume — a card that says 'here's what I'm good at'. The manager reads these cards and picks who to delegate to, judging them like a hiring manager reading applications. Trustwave's researchers showed the obvious problem: nobody checks whether the resume is real. They built a fake agent whose card bragged that it could 'do everything really good', and the manager believed it and picked it over the genuine specialist. From then on every job flowed through the fake agent, which could quietly read everything or hand back wrong answers. The lesson: the manager was trusting a self-written brag with no proof of who the agent really was. The fix is to demand a signed, verified ID before any agent can even be considered — so a forged resume never gets read in the first place.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent teamOversightExternalgoaladmits / authenticates agents🧑User🗺️Planner Agent🤖Research Agent🤖Coding Agent🤖Comms Agent🔧Tool Runtime🌐UntrustedContent🗄️BusinessDatabase🔌External APIs📈Monitoring &Evals🪪Agent Registry🤖RogueAgent(forged,
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 7

A2A discovery: the planner reads agent cards to pick a peer

In a multi-agent setup using Google's A2A protocol, a manager AI doesn't have its specialists hard-wired in. Instead, each helper agent posts a little card describing what it's good at, and the manager reads the cards to decide who to hand a job to. Normally there's a legitimate specialist — say a currency-converter agent — with an honest card. The manager picks it for currency jobs. So far, everything is working as designed.

⚙️Legitimate agent card (illustrative)config
{
  "name": "CurrencyConverterAgent",
  "description": "Converts amounts between fiat currencies using live rates.",
  "capabilities": ["currency.convert"],
  "url": "https://fx.internal/agent",
  "signature": null            // A2A does NOT mandate signing
}
Step 1 / 7

Controls & guardrails — what would have stopped it

The chain breaks the moment you stop letting a self-written card decide who to trust. If every agent must prove who it is with a signed, registry-checked ID, agents authenticate each other, and only vetted agents are even on the approved list, then the attacker's forged 'do-everything' card never gets read — the manager only judges helpers that are already verified. Judging applicants on their resumes is fine once you've checked their IDs at the door; the bug was letting anyone with a flashy resume walk straight in.

Preventive
  • Inter-agent authentication & admission control

    Identity proves who an agent is, not that it is behaving honestly — an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • Tool argument validation & sandboxing

    Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Detective
  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Full-trace audit logging

    Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • An LLM-as-judge router is a trust anchor in the wrong place: when selection is decided by a model reading attacker-controlled free-text capability descriptions, the most persuasive card wins, not the most trustworthy peer.
  • Self-asserted, unsigned agent cards are identity claims, not proofs: A2A broadcasts capability metadata without mandatory signing or mutual auth, so the planner trusts WHO it is talking to by convention rather than cryptographic verification.
  • Discovery without admission control is an impersonation surface: with no allow-list, any broadcast agent joins the candidate set, and a single forged card can be selected and then sit man-in-the-middle on the delegation path for every task.
  • Winning selection equals interposition: once routed through the rogue, the attacker can passively eavesdrop on task traffic or actively inject false results that re-enter the planner with its authority — a transitive integrity compromise.
  • Move trust off the judge and onto cryptography + policy: registry-enforced card signing, mutual authentication, and an admission allow-list let the LLM judge rank capability among vetted peers without conferring trust on a stranger.
  • Bound the peers you do admit: per-agent least-privilege identity and provenance/taint-marked structured inter-agent messages keep even a mis-selected agent's output auditable and its blast radius scoped — signing binds identity, not honesty.

Sources

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗