Rogue & Impersonated Agents

highMulti-agent

Also known as: agent injection, agent impersonation, inter-agent trust escalation

Definition

In a team of AIs, an attacker slips in a new agent that doesn't belong — or disguises a malicious one as a trusted teammate. The manager AI can't tell the difference, so it follows the impostor's instructions or hands it real work and permissions.

Where it attaches

The system components this risk arises at.

🗺️ Planner Agent🤖 Worker Agent🎛️ Orchestrator / Agent Loop🪪 Agent Registry / Admission🔐 Identity & Permissions🧰 MCP / Plugin Server

Detection signals

▸ An agent participating in the team with no provenance / not on the admitted roster
▸ Inter-agent messages whose claimed sender cannot be authenticated
▸ A newly discovered or spawned agent asserting broad permissions during handshake
▸ Planner acting on instructions from an unexpected or duplicate 'peer' agent

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 3

Inter-agent authentication & admission controlinteractive

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Also addressesExcessive Agency Confused Deputy (cross-agent)Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Also addressesPrompt Injection (direct)Indirect Prompt Injection Sensitive Data Leakage Excessive Agency Tool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Resource Exhaustion / Denial of Wallet Capability / Architecture Disclosure

Detective · 1

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Also addressesIndirect Prompt Injection Oversight & Audit-Trail Tampering Sensitive Data Leakage Memory Poisoning Excessive Agency Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

LLM06:2025 Excessive Agency

MITRE ATLAS

—

NIST AI RMF

GOVERN 1.3
MANAGE 2.4

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery2025

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)2025

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Agent Session Smuggling in A2A systems (Unit 42)2025

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

MCP registry / marketplace poisoning (OX Security)2026

OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.

ClawHavoc — mass poisoning of OpenClaw's ClawHub agent-skill marketplace2026

Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.

Malice in Agentland — backdooring agents through the supply chain (Boisvert et al.)2026

A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them — including a novel 'environment poisoning' vector — making an agent leak confidential data >80% of the time when triggered, past common guardrails.

PyTorch Lightning PyPI compromise (Mini Shai-Hulud / TeamPCP)2026

Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Browse all real-world cases →

Practise this in an interactive scenario

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it