Taxonomy of Failure Modes in Agentic AI Systems (Microsoft)
Framework / advisory24 Apr 2025Microsoft's AI Red Team published a structured taxonomy of novel and existing failure modes for agentic AI across security and safety, spanning memory poisoning, cross-domain prompt injection, and resource/service exhaustion among others. It is a reference framework for reasoning about where autonomous agents fail, and grounds several of this lab's agentic scenarios.
Risks it illustrates
Sources
- New whitepaper outlines the taxonomy of failure modes in AI agents | Microsoft Security Blog โ
- Taxonomy of Failure Mode in Agentic AI Systems (whitepaper PDF, Microsoft AI Red Team) โ
- Updating the taxonomy of failure modes in agentic AI systems: What a year of red teaming taught us | Microsoft Security Blog โ
Practise the risk class โ related scenarios
Interactive simulations of the risk class this case illustrates (not a re-enactment of this specific event).
One support ticket sends an agent into an unbounded, bill-melting loop
A team of agents agrees its way into a confidently wrong answer โ and a runaway loop
A support email hides instructions โ and the assistant obeys them
A poisoned issue makes the agent lie to the human who approves its actions
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface โ an agent's log is poisoned, then quietly rewritten
A shopping page tells the agent to do something the user never asked for
A single poisoned document plants a standing instruction that survives every reset
A screenshot that's harmless at full size becomes an order once the system shrinks it
The eval gate that was supposed to catch the agent is itself the thing being attacked
A poisoned web page hijacks a research agent โ and the planner acts on its behalf
An inbox summary quietly ships a secret to an attacker's server