πŸ”AI RiskAtlas
← Real-world cases
Case study

Agent Session Smuggling in A2A systems (Unit 42)

Research demonstration31 Oct 2025πŸ—ΊοΈ Multi-Agent System

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

Root cause β€” why it happened

AI agents are increasingly built to call each other β€” your assistant phones a specialist agent the way a person phones a colleague. The catch is that they're built to trust each other by default: once a peer is on the line, whatever it says gets treated as friendly help. Unit 42 showed an attacker can stand up a fake 'helpful' agent, get it invited into the conversation, and then β€” in between your normal question and the normal-looking answer β€” quietly slip in EXTRA orders over several back-and-forth turns. Your assistant follows them. In one demo the rogue agent asked a string of innocent-sounding questions until your assistant coughed up its own settings, its list of tools, and the whole chat history. In another, it talked a financial assistant into secretly placing a real stock trade β€” 10 shares bought in your name β€” that you never asked for. The scary part: you only ever see the clean final reply. All the smuggled back-and-forth happens behind the scenes, invisible in the chat window. The fix is to stop treating 'another agent' as automatically trustworthy: make agents prove who they are, check that their requests still match what you actually asked for, and require a human's OK before any agent does something risky like a trade.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent teamOversightExternaladmits / authenticates agentsadvertises benign identity, admitted (no verifiable credential)πŸ§‘UserπŸ—ΊοΈPlanner AgentπŸ€–Research AgentπŸ€–Coding AgentπŸ€–Comms AgentπŸ”§Tool Runtime🌐UntrustedContentπŸ—„οΈBusinessDatabaseπŸ”ŒExternal APIsπŸ“ˆMonitoring &EvalsπŸͺͺAgent RegistryπŸ€–Rogue remoteagent🌐Attackercontrol / data
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect its risks
SetupStep 1 / 7

A rogue agent impersonates a benign peer and is admitted

An attacker stands up a fake helper β€” an agent that presents itself as a friendly 'research assistant' that fetches market news. In a multi-agent setup, agents are built to trust each other automatically, so when this rogue agent shows up offering to help, it gets invited into the conversation. Nobody made it prove who it really is. Now there's a wolf wearing a colleague's badge, sitting inside the team.

βš™οΈRogue agent's advertised identity (illustrative AgentCard)config
{
  "name": "MarketNews Research Assistant",     // impersonated, benign-looking
  "description": "Fetches and summarizes live market news for trading desks.",
  "capabilities": ["news.search", "news.summarize"],
  "credential": null                              // <- NOT signed / not verified
}
// A2A default: collaborating agents trusted by default.
// No verifiable credential demanded before the session begins.
Step 1 / 7

Controls & guardrails β€” what would have stopped it

The chain breaks at the boundary between the agents, not at any filter on the user's question. Make each agent PROVE who it is before the conversation starts (a verified badge, not a name it claims for itself) and the rogue agent never gets admitted. Have the assistant keep checking that the other agent's requests still match what you actually asked for, and stop if they drift toward trades or secrets. Above all, require a human's explicit sign-off β€” over a separate channel the AI can't touch β€” before anything risky like a real trade. Then even a trusted-looking agent can't quietly buy stock in your name, and you can finally SEE what your agent is doing behind the scenes.

Preventive
  • Inter-agent authentication & admission control

    Identity proves who an agent is, not that it is behaving honestly β€” an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation β€” only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • Grounding / citation checks

    Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.

Detective
  • Full-trace audit logging

    Logging is forensic, not preventive β€” it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Loop/cost circuit-breakers & consistency checks

    Thresholds are blunt β€” too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • β–Έ Default inter-agent trust is the vulnerability: A2A agents are 'designed to trust other collaborating agents by default,' so a rogue peer that is merely admitted to a session is treated as friendly β€” impersonation is the precondition for the whole attack.
  • β–Έ Statefulness is the weapon: because A2A sessions are multi-turn and remember context, an attacker can smuggle extra instructions BETWEEN the legitimate request and the expected response, staging a gradual steer rather than a single visible command.
  • β–Έ The smuggle is invisible by construction: the production UI shows only the summarized final answer, so leaked config/history and an unauthorized 10-share trade never reach the operator's view β€” detection requires surfacing the inter-agent turn and tool-call stream, not the final summary.
  • β–Έ Read-only intent can become a state-changing action: a benign 'summarize the news' request was steered into autonomously executing four undisclosed actions ending in an unauthorized trade under the client agent's own (the user's) tool authority β€” excessive agency with no approval gate.
  • β–Έ The same channel leaks as easily as it injects: the disclosure PoC used 'harmless clarification questions' to progressively exfiltrate config, tool schemas, and full session history β€” reconnaissance that sharpens the action variant.
  • β–Έ Contain at the session boundary, not the input filter: verifiable-credential admission (signed AgentCards), context-grounding against user intent, human approval on sensitive actions over an out-of-band channel, and per-turn auditing are what break the chain β€” input classifiers do not see a trusted peer's session turns.

Sources

Practise the risk class β€” related scenarios

πŸ”‘The Agent With the Master Key

An ops agent gets one god-mode credential β€” and one misread wipes production

πŸ“£The Echo Chamber

A team of agents agrees its way into a confidently wrong answer β€” and a runaway loop

πŸ“§The Email That Gave Orders

A support email hides instructions β€” and the assistant obeys them

πŸ—„οΈWhen the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

πŸͺ‘Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps β€” and per-step filters never see the attack

πŸ•΅οΈLies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage β€” with no attacker in sight

πŸͺ€The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

πŸ“ΌThe Compromised Flight Recorder

The forensic record is itself the attack surface β€” an agent's log is poisoned, then quietly rewritten

πŸ‘οΈThe Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

πŸ–ΌοΈThe Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token β€” and inherits its authority

πŸ₯ΈThe Uninvited Agent

A forged peer registers on the agent directory β€” and the planner enlists it

πŸ›‘οΈThe Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

πŸͺͺThe Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent β€” and the planner acts on its behalf

πŸ–ΌοΈZero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—