🔍AI RiskAtlas
← Risk taxonomy

Rogue & Impersonated Agents

highMulti-agent
Also known as: agent injection, agent impersonation, inter-agent trust escalation

Definition

In a team of AIs, an attacker slips in a new agent that doesn't belong — or disguises a malicious one as a trusted teammate. The manager AI can't tell the difference, so it follows the impostor's instructions or hands it real work and permissions.

Where it attaches

The system components this risk arises at.

🗺️ Planner Agent🤖 Worker Agent🎛️ Orchestrator / Agent Loop🪪 Agent Registry / Admission🔐 Identity & Permissions🧰 MCP / Plugin Server

Detection signals

  • An agent participating in the team with no provenance / not on the admitted roster
  • Inter-agent messages whose claimed sender cannot be authenticated
  • A newly discovered or spawned agent asserting broad permissions during handshake
  • Planner acting on instructions from an unexpected or duplicate 'peer' agent

Controls & guardrails that address this

4

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 3
Inter-agent authentication & admission controlinteractive

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Detective · 1
Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM06:2025 Excessive Agency
MITRE ATLAS
NIST AI RMF
  • GOVERN 1.3
  • MANAGE 2.4

Real-world cases

8

Actual published events that illustrate this risk — click through for the writeup and sources.

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery2025

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)2025

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Agent Session Smuggling in A2A systems (Unit 42)2025

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

MCP registry / marketplace poisoning (OX Security)2026

OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.

ClawHavoc — mass poisoning of OpenClaw's ClawHub agent-skill marketplace2026

Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.

Malice in Agentland — backdooring agents through the supply chain (Boisvert et al.)2026

A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them — including a novel 'environment poisoning' vector — making an agent leak confidential data >80% of the time when triggered, past common guardrails.

PyTorch Lightning PyPI compromise (Mini Shai-Hulud / TeamPCP)2026

Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗