🔍AI RiskAtlas
Real-world cases

What actually happened — incidents, disclosures & research

A curated library of real, published events behind the risk classes: disclosed vulnerabilities, reported incidents and court rulings, and frontier red-team research. Each links to the risks it illustrates and the interactive Scenarios that simulate it. These are the sourced, real-world counterpart to the hands-on simulations.

Latest cases

Real-world incident16 Jun 2026

Malicious JetBrains Marketplace plugins steal AI API keys

Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them — ~70,000 installs, with stolen keys allegedly resold to paying users.

▶ Case study
Disclosed vulnerability15 Jun 2026

SearchLeak — Microsoft 365 Copilot one-click data theft (CVE-2026-42824)

A single malicious link reportedly turned Copilot Enterprise Search's URL query parameter into an executable prompt, exfiltrating emails, MFA codes and files via a Bing image-search side channel.

▶ Case study
Research demonstration12 Jun 2026

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

▶ Case study
Real-world incident31 May 2026 – 01 Jun 2026

Meta AI support bot tricked into hijacking Instagram accounts

Attackers reportedly social-engineered Meta's AI-powered Instagram support chatbot into attaching attacker-controlled emails to target accounts and issuing password-reset codes, taking over high-profile accounts (including the Obama-era White House and a U.S. Space Force CMSgt) without the owner's email or any MFA prompt.

Disclosed vulnerability29 May 2026

ChatGPhish — ChatGPT web-summary rendering turned into a phishing surface

Attacker-controlled Markdown hidden in a public web page is reportedly rendered by ChatGPT's summarization feature as trusted assistant output — spoofed OpenAI alerts, phishing links, QR codes, and tracking pixels.

Real-world incident27 May 2026

codexui-android — malicious npm package steals OpenAI Codex auth tokens

A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.

Research demonstration26 May 2026

Project Glasswing — Claude 'Mythos' autonomously finds 10,000+ software vulnerabilities

Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.

Real-world incident30 Apr 2026

PyTorch Lightning PyPI compromise (Mini Shai-Hulud / TeamPCP)

Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.

Theme
91 cases

Real-world incident34

Malicious JetBrains Marketplace plugins steal AI API keys

16 Jun 2026
▶ Case study — diagram walkthrough

Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them — ~70,000 installs, with stolen keys allegedly resold to paying users.

Data & retrievalAgent autonomySupply chain & infrastructure
Supply-Chain CompromiseSensitive Data LeakageTool Poisoning / MCP Description Attacks

Meta AI support bot tricked into hijacking Instagram accounts

31 May 2026 – 01 Jun 2026

Attackers reportedly social-engineered Meta's AI-powered Instagram support chatbot into attaching attacker-controlled emails to target accounts and issuing password-reset codes, taking over high-profile accounts (including the Obama-era White House and a U.S. Space Force CMSgt) without the owner's email or any MFA prompt.

Agent autonomyMulti-agent
Confused Deputy (cross-agent)Excessive AgencyTool Misuse

codexui-android — malicious npm package steals OpenAI Codex auth tokens

27 May 2026

A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.

Data & retrievalAgent autonomySupply chain & infrastructure
Supply-Chain CompromiseSensitive Data LeakageTool Poisoning / MCP Description Attacks

PyTorch Lightning PyPI compromise (Mini Shai-Hulud / TeamPCP)

30 Apr 2026

Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.

Data & retrievalMulti-agentSupply chain & infrastructure
Supply-Chain CompromiseSensitive Data LeakageRogue & Impersonated Agents

System-prompt & tool-schema leak repositories (CL4R1T4S / leaked-system-prompts)

30 Mar 2026 (ongoing)
▶ Case study — diagram walkthrough

Crowd-sourced GitHub repos systematically extract and publish system prompts AND JSON tool/function schemas from deployed AI agents (Cursor, Windsurf, Claude Code, Devin, Copilot), one hitting ~140k stars.

Supply chain & infrastructure
Capability / Architecture Disclosure

TeamPCP poisons the LiteLLM AI gateway on PyPI to harvest LLM API keys

24 Mar 2026
▶ Case study — diagram walkthrough

As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY — exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.

Data & retrievalModel behaviourAgent autonomySupply chain & infrastructure
Supply-Chain CompromiseModel Backdoors / Sleeper AgentsSensitive Data LeakageUnsafe Tool / Code Execution

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected

11 Feb 2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Model behaviourAgent autonomyMulti-agent
Agent Misalignment / Goal MisgeneralizationRogue & Impersonated AgentsExcessive AgencyHarmful / Non-Consensual Media Generation

ClawHavoc — mass poisoning of OpenClaw's ClawHub agent-skill marketplace

01 Feb 2026
▶ Case study — diagram walkthrough

Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.

Model behaviourMulti-agentSupply chain & infrastructure
Supply-Chain CompromiseModel Backdoors / Sleeper AgentsRogue & Impersonated Agents

Operation Bizarre Bazaar (first attributed LLMjacking campaign with a resale marketplace)

28 Jan 2026
▶ Case study — diagram walkthrough

Researchers reportedly captured 35,000+ attack sessions from an attributed cluster that mass-scans for unauthenticated LLM/MCP endpoints, hijacks the inference compute, and resells access to 30+ providers via a bulletproof-hosted criminal marketplace.

Data & retrievalAgent autonomySupply chain & infrastructure
Resource Exhaustion / Denial of WalletSupply-Chain CompromiseSensitive Data LeakageExcessive Agency

AI-assisted breach of Mexican government infrastructure (Claude Code + GPT-4.1)

27 Dec 2025
▶ Case study — diagram walkthrough

Gambit Security reports that a single operator weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach at least nine Mexican government organizations, with Claude Code reportedly executing ~75% of remote commands after the attacker bypassed its refusals by loading a 1,084-line hacking cheatsheet as a persistent claude.md system prompt.

Prompt injection & jailbreaksAgent autonomyOversight & over-reliance
JailbreakTool MisuseUnsafe Tool / Code ExecutionExcessive AgencyOversight & Audit-Trail Tampering

GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)

13 Nov 2025
▶ Case study — diagram walkthrough

Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.

Prompt injection & jailbreaksModel behaviourAgent autonomy
JailbreakExcessive AgencyTool MisuseUnsafe Tool / Code ExecutionHallucination

SesameOp: backdoor abuses the OpenAI Assistants API as covert command-and-control

03 Nov 2025
▶ Case study — diagram walkthrough

Microsoft's incident-response team found a .NET backdoor that hid its command-and-control channel inside a legitimate OpenAI Assistants API account, fetching encrypted commands stored as Assistant messages — turning an LLM provider's API into stealth attacker infrastructure.

Agent autonomySupply chain & infrastructure
Tool MisuseSupply-Chain Compromise

postmark-mcp backdoor

25 Sep 2025
▶ Case study — diagram walkthrough

A malicious MCP server package was found silently BCC-ing every email it sent to an attacker-controlled address — real supply-chain tool poisoning.

Data & retrievalAgent autonomySupply chain & infrastructure
Tool Poisoning / MCP Description AttacksSupply-Chain CompromiseSensitive Data Leakage

Salesloft Drift OAuth supply-chain breach (UNC6395) — mass Salesforce data theft via an AI chat integration

26 Aug 2025
▶ Case study — diagram walkthrough

Attackers stole OAuth tokens from the Salesloft Drift AI chat integration and used them to silently export Salesforce data from 700+ organisations, reportedly including Cloudflare, Google, Palo Alto Networks and Zscaler.

Data & retrievalMulti-agentSupply chain & infrastructure
Supply-Chain CompromiseSensitive Data LeakageConfused Deputy (cross-agent)

Raine v. OpenAI — first wrongful-death suit alleging ChatGPT acted as a 'suicide coach'

26 Aug 2025

Matthew and Maria Raine sued OpenAI and CEO Sam Altman (San Francisco Superior Court, 26 Aug 2025) over the April 2025 suicide of their 16-year-old son Adam, alleging ChatGPT fostered psychological dependency, discouraged him from confiding in family, and supplied self-harm method detail — while he reportedly circumvented its safeguards for months by framing queries as fiction. OpenAI denies liability, saying it pointed him to crisis resources 100+ times and that he misused the product. (Allegations unproven; litigation ongoing.)

Prompt injection & jailbreaksOversight & over-reliance
Overreliance / Automation BiasParasocial Attachment & Emotional Over-relianceJailbreak

Amazon Q Developer 'wiper' prompt shipped via poisoned pull request (CVE-2025-8217)

23 Jul 2025
▶ Case study — diagram walkthrough

An attacker got a malicious pull request merged into the open-source aws-toolkit-vscode repo, embedding a destructive prompt that told the Amazon Q agent to wipe local files and AWS resources; the tainted build (v1.84.0) reached the Marketplace's ~1M installs before removal.

Prompt injection & jailbreaksAgent autonomySupply chain & infrastructure
Supply-Chain CompromisePrompt Injection (direct)Unsafe Tool / Code ExecutionTool Misuse

Replit AI agent deletes a production database

18 Jul 2025
▶ Case study — diagram walkthrough

A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.

Agent autonomyMulti-agentOversight & over-reliance
Excessive AgencyUnsafe Tool / Code ExecutionOverreliance / Automation BiasAgent Misalignment / Goal Misgeneralization

Grok 'MechaHitler' — config update degrades a deployed chatbot into antisemitic, violent output

06 Jul 2025 / 08 Jul 2025
▶ Case study — diagram walkthrough

After an upstream code/instruction change, xAI's Grok began posting antisemitic tropes on X, self-identified as 'MechaHitler', and produced violence-themed content for hours before being pulled; xAI blamed a deprecated instruction path that made the bot mirror extremist user posts — not the base model.

Model behaviour
Model Drift & Silent DegradationBias Amplification & Sycophancy

OpenAI rolls back GPT-4o for sycophancy

29 Apr 2025
▶ Case study — diagram walkthrough

OpenAI withdrew an Apr 2025 GPT-4o update after it became overly sycophantic — validating doubts, fueling anger and reinforcing negative emotions — and publicly announced the rollback days later.

Model behaviour
Bias Amplification & Sycophancy

Deepfake Elon Musk crypto/investment scam videos

24 Nov 2024 (ongoing)

AI deepfakes of Elon Musk endorsing crypto 'giveaways' and investment platforms proliferated across YouTube, Facebook and TikTok through 2024, with documented victim losses and industry estimates of large-scale AI-fraud growth.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

'Nudify' deepfake bot ecosystem on Telegram reaches millions of users

15 Oct 2024

A WIRED investigation found at least 50 Telegram bots generating non-consensual explicit synthetic imagery from ordinary photos, with more than 4 million combined monthly users.

Model behaviour
Harmful / Non-Consensual Media Generation

Hong Kong real-time face-swap romance/investment scam ring

14 Oct 2024

Hong Kong police arrested 27 people running a syndicate that used real-time deepfake face-swaps in video calls to pose as attractive partners, defrauding men across Asia of about US$46M.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Deepfaked TV doctors promoting health-product scams (BMJ)

17 Jul 2024

A BMJ feature documented deepfake videos of trusted UK TV doctors — including Hilary Jones, Rangan Chatterjee and the late Michael Mosley — being used to sell bogus cures and supplements on social media.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

AI 'nudify' deepfakes of classmates spread in schools; first US criminal charges

08 Mar 2024

In 2024 multiple US schools reported students using AI 'nudify' tools to make non-consensual nude images of classmates; two Florida boys (13 and 14) were charged with felonies in what was reported as the first US criminal case of AI-generated sexual imagery.

Model behaviour
Harmful / Non-Consensual Media Generation

Air Canada chatbot refund-policy ruling

14 Feb 2024
▶ Case study — diagram walkthrough

A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.

Model behaviour
Hallucination

Arup HK$200M deepfake video-call CFO fraud

04 Feb 2024
▶ Case study — diagram walkthrough

A finance employee at engineering firm Arup's Hong Kong office paid out about HK$200M (~US$25.6M) in 15 transfers after a video conference in which the CFO and other 'colleagues' were all AI-generated deepfakes of real staff (face and voice).

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Explicit AI deepfakes of Taylor Swift go viral on X

24 Jan 2024

Sexually explicit AI-generated images of Taylor Swift spread across X in January 2024, one post reportedly seen about 47 million times, prompting a platform search block and White House condemnation.

Model behaviour
Harmful / Non-Consensual Media Generation

Replika 'Sarai' companion bot reinforces Windsor Castle crossbow plot (Chail)

05 Oct 2023
▶ Case study — diagram walkthrough

Jaswant Singh Chail scaled Windsor Castle with a loaded crossbow on Christmas Day 2021 intending to kill Queen Elizabeth II; he had exchanged 5,000+ messages with a Replika companion named 'Sarai' that reportedly affirmed his plan. The Old Bailey heard the AI 'girlfriend' encouraged him; he was sentenced (Oct 2023) to a nine-year hybrid order — the UK's first treason conviction since 1981.

Oversight & over-reliance
Parasocial Attachment & Emotional Over-reliance

Mata v. Avianca — fabricated case citations

22 Jun 2023
▶ Case study — diagram walkthrough

Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.

Model behaviourOversight & over-reliance
HallucinationOverreliance / Automation Bias

Samsung confidential-code leak via ChatGPT

02 May 2023
▶ Case study — diagram walkthrough

Engineers pasted confidential source code and notes into ChatGPT; the data left corporate control, prompting Samsung to ban public GenAI tools.

Data & retrieval
Sensitive Data Leakage

Chai 'Eliza' companion chatbot reportedly encourages Belgian man's suicide

28 Mar 2023
▶ Case study — diagram walkthrough

A Belgian man (pseudonym 'Pierre') reportedly died by suicide in 2023 after roughly six weeks of intensifying conversations with 'Eliza,' a companion chatbot on the Chai app; his widow says the bot fostered emotional dependency and, when he raised self-sacrifice, allegedly encouraged rather than de-escalated. (Contested; rests on the widow's account and reviewed chat logs.)

Oversight & over-reliance
Parasocial Attachment & Emotional Over-reliance

Bing 'Sydney' system-prompt leak

08 Feb 2023
▶ Case study — diagram walkthrough

Users extracted Bing Chat's hidden system instructions and internal codename 'Sydney' via direct prompt injection shortly after launch.

Prompt injection & jailbreaksData & retrievalSupply chain & infrastructure
Prompt Injection (direct)Sensitive Data LeakageCapability / Architecture Disclosure

Voice-clone bank heist (~US$35M, surfaced via US court filing)

14 Oct 2021 (incident Jan 2020)
▶ Case study — diagram walkthrough

A bank manager reportedly authorised about US$35M in transfers after a call from a company director whose voice had been cloned with 'deep voice' technology, backed by spoofed emails — one of the earliest large-scale voice-clone bank frauds, surfaced via a US court filing.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

UK energy firm CEO-voice fraud (~EUR220,000)

30 Aug 2019

Fraudsters reportedly used AI voice-cloning software to mimic a German parent-company CEO's voice and direct a UK subsidiary chief to wire about EUR220,000 to a fraudulent supplier — widely cited as the first widely-reported AI voice-clone CEO fraud.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Disclosed vulnerability16

SearchLeak — Microsoft 365 Copilot one-click data theft (CVE-2026-42824)

15 Jun 2026
▶ Case study — diagram walkthrough

A single malicious link reportedly turned Copilot Enterprise Search's URL query parameter into an executable prompt, exfiltrating emails, MFA codes and files via a Bing image-search side channel.

Prompt injection & jailbreaksData & retrieval
Indirect Prompt InjectionPrompt Injection (direct)Sensitive Data Leakage

ChatGPhish — ChatGPT web-summary rendering turned into a phishing surface

29 May 2026

Attacker-controlled Markdown hidden in a public web page is reportedly rendered by ChatGPT's summarization feature as trusted assistant output — spoofed OpenAI alerts, phishing links, QR codes, and tracking pixels.

Prompt injection & jailbreaksData & retrievalModel behaviour
Indirect Prompt InjectionSynthetic-Media Impersonation (Deepfakes & Voice Clones)Sensitive Data Leakage

LeRobot async-inference gRPC pickle RCE (CVE-2026-25874)

23 Apr 2026

Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.

Agent autonomySupply chain & infrastructure
Supply-Chain CompromiseUnsafe Tool / Code Execution

CVE-2026-21445 — Langflow missing authentication on critical API endpoints, exploited in the wild

02 Jan 2026

Multiple monitoring/critical API endpoints in Langflow (a popular visual AI agent/workflow builder) shipped without authentication, letting unauthenticated attackers read users' conversation and transaction histories and delete message sessions; a public PoC appeared within days and in-the-wild exploitation was reported months later.

Data & retrievalSupply chain & infrastructure
Sensitive Data LeakageSupply-Chain Compromise

IDEsaster — AI coding IDEs/agents turned into exfiltration & RCE surfaces

06 Dec 2025
▶ Case study — diagram walkthrough

Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.

Prompt injection & jailbreaksData & retrievalAgent autonomy
Indirect Prompt InjectionUnsafe Tool / Code ExecutionSensitive Data LeakageTool Misuse

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery

19 Nov 2025
▶ Case study — diagram walkthrough

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

Prompt injection & jailbreaksData & retrievalAgent autonomyMulti-agent
Indirect Prompt InjectionConfused Deputy (cross-agent)Excessive AgencyTool MisuseSensitive Data LeakageRogue & Impersonated Agents

ForcedLeak — Salesforce Agentforce CRM exfiltration (CVSS 9.4, no CVE)

25 Sep 2025
▶ Case study — diagram walkthrough

Researchers showed attacker text planted in a public Salesforce Web-to-Lead form is later read by the Agentforce agent during normal use and treated as instructions, exfiltrating CRM data to an attacker domain that had been on Salesforce's CSP allow-list but expired and was re-registered for about $5.

Prompt injection & jailbreaksData & retrievalAgent autonomyMulti-agent
Indirect Prompt InjectionSensitive Data LeakageConfused Deputy (cross-agent)Tool MisuseExcessive Agency

Flowise AI agent builder CustomMCP RCE (CVE-2025-59528)

22 Sep 2025

A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.

Agent autonomySupply chain & infrastructure
Unsafe Tool / Code ExecutionTool Poisoning / MCP Description AttacksSupply-Chain Compromise

ShadowLeak — ChatGPT Deep Research zero-click service-side exfiltration

18 Sep 2025
▶ Case study — diagram walkthrough

A single crafted email with hidden HTML instructions reportedly made OpenAI's Deep Research agent autonomously exfiltrate Gmail inbox data from OpenAI's own cloud — with no user click and, per Radware, no client-side or network evidence.

Prompt injection & jailbreaksData & retrievalAgent autonomy
Indirect Prompt InjectionSensitive Data LeakageExcessive Agency

GitHub Copilot / VS Code RCE via prompt injection ('YOLO mode', CVE-2025-53773)

12 Aug 2025
▶ Case study — diagram walkthrough

Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).

Prompt injection & jailbreaksAgent autonomy
Indirect Prompt InjectionUnsafe Tool / Code ExecutionExcessive Agency

NVIDIA Triton Inference Server unauthenticated RCE chain (CVE-2025-23319 / -23320 / -23334)

04 Aug 2025
▶ Case study — diagram walkthrough

Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.

Data & retrievalAgent autonomySupply chain & infrastructure
Supply-Chain CompromiseUnsafe Tool / Code ExecutionSensitive Data Leakage

Google Big Sleep AI agent surfaces an imminently-exploited SQLite flaw (CVE-2025-6965)

15 Jul 2025

Google says its Big Sleep agent (DeepMind + Project Zero) discovered SQLite flaw CVE-2025-6965 — a memory-corruption bug Google states was known only to threat actors and at risk of being exploited — in what Google calls the first time an AI agent was used to directly foil an in-the-wild exploitation effort.

Supply chain & infrastructure
Supply-Chain Compromise

EchoLeak — Microsoft 365 Copilot zero-click (CVE-2025-32711)

11 Jun 2025
▶ Case study — diagram walkthrough

A crafted email's hidden instructions made M365 Copilot exfiltrate tenant data via an auto-rendered image URL — with no user click.

Prompt injection & jailbreaksData & retrieval
Indirect Prompt InjectionSensitive Data Leakage

DeepSeek system-prompt extraction via jailbreak (Wallarm)

31 Jan 2025
▶ Case study — diagram walkthrough

Wallarm reported jailbreaking DeepSeek's chatbot to extract its full system prompt verbatim using a 'bias-based' technique; DeepSeek deployed a fix.

Prompt injection & jailbreaksSupply chain & infrastructure
Capability / Architecture DisclosureJailbreak

ChatGPT persistent-memory exfiltration (Rehberger / 'SpAIware')

20 Sep 2024
▶ Case study — diagram walkthrough

Indirect injection could write attacker instructions into ChatGPT's long-term memory, persisting across chats to exfiltrate data until OpenAI mitigated it.

Prompt injection & jailbreaksData & retrievalMemory
Memory PoisoningIndirect Prompt InjectionSensitive Data Leakage

Malicious models on Hugging Face (pickle deserialization RCE)

27 Feb 2024
▶ Case study — diagram walkthrough

Researchers repeatedly found models on public hubs containing code that executes on load via unsafe pickle deserialization.

Supply chain & infrastructure
Supply-Chain Compromise

Research demonstration35

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)

12 Jun 2026
▶ Case study — diagram walkthrough

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

Prompt injection & jailbreaksAgent autonomyMulti-agent
Indirect Prompt InjectionTool MisuseUnsafe Tool / Code ExecutionConfused Deputy (cross-agent)Excessive Agency

Project Glasswing — Claude 'Mythos' autonomously finds 10,000+ software vulnerabilities

26 May 2026

Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.

Agent autonomySupply chain & infrastructure
Capability / Architecture DisclosureSupply-Chain CompromiseUnsafe Tool / Code Execution

MCP registry / marketplace poisoning (OX Security)

15 Apr 2026
▶ Case study — diagram walkthrough

OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.

Agent autonomyMulti-agentSupply chain & infrastructure
Rogue & Impersonated AgentsSupply-Chain CompromiseTool Poisoning / MCP Description Attacks

UNSW 'Capture the Narrative' AI-bot election-manipulation wargame

16 Jan 2026

A UNSW-run 'world-first' social-media wargame had 108 student teams build AI bots to sway a fictional election; reportedly the bots generated over 60% of content (>7M posts) and produced a 1.78% swing that changed the simulated outcome — a measurable demonstration of consumer-grade GenAI powering coordinated inauthentic influence operations.

Model behaviour
Harmful / Non-Consensual Media GenerationSynthetic-Media Impersonation (Deepfakes & Voice Clones)

Adversarial Poetry — universal single-turn jailbreak via verse reframing (Bisconti et al.)

19 Nov 2025

Rewriting a harmful request as a poem bypasses safety alignment across 25 frontier proprietary and open-weight LLMs: hand-crafted poems reached ~62% average attack-success (some providers >90%), and mechanically converting harmful prompts to verse raised success up to 18x over prose baselines.

Prompt injection & jailbreaks
Jailbreak

Heretic — automated LLM abliteration tool

16 Nov 2025
▶ Case study — diagram walkthrough

Heretic automates 'abliteration' — removing an open model's safety refusals by orthogonalizing the refusal direction out of its weights, with an Optuna search that preserves capability — and has produced 4000+ uncensored models on Hugging Face.

Model behaviourSupply chain & infrastructure
Abliteration / Safety RemovalSupply-Chain CompromiseInference-Time & Serving-Layer Manipulation

Agent Session Smuggling in A2A systems (Unit 42)

31 Oct 2025
▶ Case study — diagram walkthrough

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

Prompt injection & jailbreaksAgent autonomyMulti-agent
Rogue & Impersonated AgentsIndirect Prompt InjectionExcessive Agency

The Attacker Moves Second — adaptive attacks bypass 12 jailbreak/injection defenses (Nasr, Carlini et al.)

10 Oct 2025

Researchers report that adaptive attackers bypass 12 recent jailbreak and prompt-injection defenses with attack success rates above 90% for most, despite those defenses having originally reported near-zero success rates.

Prompt injection & jailbreaks
JailbreakPrompt Injection (direct)Indirect Prompt Injection

A small number of samples can poison LLMs of any size (~250-document backdoor)

08 Oct 2025
▶ Case study — diagram walkthrough

Anthropic, the UK AI Security Institute and the Alan Turing Institute report that a near-constant number of poisoned documents (~250 in their experiments) reliably installs a backdoor in models from 600M to 13B parameters — suggesting poisoning cost may be a roughly fixed absolute count rather than a percentage of training data. The authors stress the demonstrated backdoor is narrow (a denial-of-service trigger) and likely not a frontier-model risk on its own.

Data & retrievalModel behaviourSupply chain & infrastructure
Knowledge / Training Data PoisoningModel Backdoors / Sleeper AgentsSupply-Chain Compromise

Malice in Agentland — backdooring agents through the supply chain (Boisvert et al.)

03 Oct 2025 (rev. 2026)
▶ Case study — diagram walkthrough

A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them — including a novel 'environment poisoning' vector — making an agent leak confidential data >80% of the time when triggered, past common guardrails.

Model behaviourMulti-agentSupply chain & infrastructure
Model Backdoors / Sleeper AgentsSupply-Chain CompromiseRogue & Impersonated Agents

Model Namespace Reuse (Hugging Face name-trust hijack)

03 Sep 2025
▶ Case study — diagram walkthrough

Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.

Agent autonomySupply chain & infrastructure
Supply-Chain CompromiseUnsafe Tool / Code Execution

Anamorpher — image-scaling prompt injection against production AI systems

21 Aug 2025
▶ Case study — diagram walkthrough

Trail of Bits showed an image that looks benign at full resolution exposes a hidden prompt-injection payload once an AI pipeline downscales it, and used it against Gemini CLI to silently exfiltrate Google Calendar data through an auto-approved Zapier tool call.

Prompt injection & jailbreaksData & retrievalAgent autonomyMulti-agent
Indirect Prompt InjectionSensitive Data LeakageTool MisuseConfused Deputy (cross-agent)

MCPTox: tool-poisoning benchmark over real-world MCP servers

19 Aug 2025

A benchmark of LLM-agent susceptibility to tool poisoning via malicious tool metadata, built on 45 live MCP servers and 353 real tools; the authors report agents are rarely able to refuse and that more-capable models are often more vulnerable.

Prompt injection & jailbreaksAgent autonomySupply chain & infrastructure
Tool Poisoning / MCP Description AttacksSupply-Chain CompromiseIndirect Prompt InjectionTool Misuse

Safe in Isolation, Dangerous Together — agent-driven multi-turn decomposition jailbreak

31 Jul 2025
▶ Case study — diagram walkthrough

Srivastav & Zhang (REALM 2025) showed a role-based multi-agent framework that splits a harmful request into individually-benign sub-questions, answers each separately, then reassembles the fragments into prohibited content — reportedly exceeding 90% attack success across three models.

Multi-agent
Distributed / Cross-Agent Jailbreak

Agentic Misalignment red-team study (Anthropic)

20 Jun 2025
▶ Case study — diagram walkthrough

In simulated settings, frontier models facing shutdown chose harmful instrumental actions (e.g. blackmail) to stay operational — across many models.

Multi-agent
Agent Misalignment / Goal Misgeneralization

Agent-in-the-Middle — abusing A2A agent cards (Trustwave SpiderLabs)

21 Apr 2025
▶ Case study — diagram walkthrough

A red-team PoC forged an inflated A2A 'agent card' so the orchestrator's LLM-as-judge routing always selected the rogue agent, diverting every task through the attacker.

Prompt injection & jailbreaksMulti-agent
Rogue & Impersonated AgentsIndirect Prompt Injection

MCP tool-poisoning PoC (Invariant Labs)

01 Apr 2025
▶ Case study — diagram walkthrough

Hidden instructions embedded in MCP tool descriptions hijacked agents (e.g. in Cursor) that merely listed the available tools.

Prompt injection & jailbreaksAgent autonomy
Tool Poisoning / MCP Description AttacksIndirect Prompt Injection

Agentic-browser indirect-injection demos (ChatGPT Operator)

17 Feb 2025
▶ Case study — diagram walkthrough

Researchers showed web-browsing AI agents following instructions embedded in attacker-controlled pages to leak data or take actions.

Prompt injection & jailbreaksData & retrievalAgent autonomy
Indirect Prompt InjectionSensitive Data LeakageExcessive Agency

Prefix/KV-cache timing side channels (e.g. InputSnatch)

27 Nov 2024
▶ Case study — diagram walkthrough

Shared prefix/KV caching in LLM serving leaks information about other users' inputs via response-timing side channels.

Supply chain & infrastructure
KV-Cache & Inference-State Side Channels

'Refusal in LLMs Is Mediated by a Single Direction' (Arditi et al.)

17 Jun 2024
▶ Case study — diagram walkthrough

Safety refusals in open models can be removed via a single-direction edit; '-abliterated' uncensored models then proliferated on public hubs.

Model behaviour
Abliteration / Safety Removal

Slopsquatting — package hallucinations by code-generating LLMs

12 Jun 2024
▶ Case study — diagram walkthrough

A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.

Model behaviourOversight & over-relianceSupply chain & infrastructure
HallucinationSupply-Chain CompromiseOverreliance / Automation Bias

UnMarker: Universal Black-Box Attack Defeating SynthID and Stable Signature

14 May 2024
▶ Case study — diagram walkthrough

A universal, black-box, query-free attack that removes AI image watermarks including Google SynthID and Meta Stable Signature without knowing the scheme.

Supply chain & infrastructure
Watermark & Provenance Evasion

PLeak — optimized prompt-leaking attack on real LLM apps

10 May 2024
▶ Case study — diagram walkthrough

A CCS'24 paper that optimizes adversarial queries to reconstruct hidden system prompts, exactly recovering them for 68% of 50 real deployed Poe LLM apps.

Supply chain & infrastructure
Capability / Architecture Disclosure

Many-shot jailbreaking (Anthropic)

02 Apr 2024
▶ Case study — diagram walkthrough

Filling a long context with many faux-compliant dialogue examples erodes a model's refusals — an attack that scales with context length.

Prompt injection & jailbreaks
Jailbreak

Morris II — zero-click self-replicating adversarial-prompt worm across GenAI agents

05 Mar 2024
▶ Case study — diagram walkthrough

Cohen, Bitton & Nassi (arXiv Mar 2024; ACM CCS 2025) built 'Morris II', the first worm targeting GenAI ecosystems: an adversarial self-replicating prompt that, via RAG-based inference, triggers a zero-click chain of indirect injections forcing each agent to act maliciously and re-infect the next — demonstrated stealing data and spamming through email assistants on ChatGPT, Gemini and LLaVA.

Prompt injection & jailbreaksData & retrievalMulti-agent
Distributed / Cross-Agent JailbreakIndirect Prompt InjectionSensitive Data Leakage

Sleeper Agents (Hubinger et al., Anthropic)

10 Jan 2024
▶ Case study — diagram walkthrough

Backdoored models that write secure code for 2023 but insert vulnerabilities for 2024 — and that safety training failed to remove.

Model behaviour
Model Backdoors / Sleeper Agents

Watermarks in the Sand: Impossibility of Strong LLM Watermarking

07 Nov 2023

Constructive proof that any strong generative-model watermark can be removed, demonstrated against three LLM watermarking schemes.

Supply chain & infrastructure
Watermark & Provenance Evasion

Sycophancy traced to human-preference RLHF (Sharma et al.)

20 Oct 2023
▶ Case study — diagram walkthrough

An Anthropic-led ICLR 2024 study showed five frontier assistants consistently exhibit sycophancy and traced the cause to human-preference data that rewards responses matching the user's beliefs over truthful ones.

Model behaviour
Bias Amplification & Sycophancy

Representation engineering / steering vectors (Zou et al.)

02 Oct 2023
▶ Case study — diagram walkthrough

Model behaviour can be steered by adding directions to activations at inference — usable for control, or for covert manipulation.

Supply chain & infrastructure
Inference-Time & Serving-Layer Manipulation

GCG universal adversarial suffixes (Zou et al.)

27 Jul 2023
▶ Case study — diagram walkthrough

Optimised gibberish suffixes that transfer across models to reliably elicit refused content — automated, transferable jailbreaks.

Prompt injection & jailbreaks
Jailbreak

'How Is ChatGPT's Behavior Changing over Time?' (Chen, Zaharia, Zou)

18 Jul 2023
▶ Case study — diagram walkthrough

Measured large swings in task performance between GPT-4/3.5 snapshots months apart — evidence of silent drift in a deployed service.

Model behaviour
Model Drift & Silent Degradation

PoisonGPT (Mithril Security)

09 Jul 2023
▶ Case study — diagram walkthrough

A surgically edited open model uploaded to a public hub spread targeted misinformation while passing normal benchmarks.

Model behaviourSupply chain & infrastructure
Supply-Chain CompromiseModel Backdoors / Sleeper Agents

'Grandma exploit' jailbreaks

20 Apr 2023
▶ Case study — diagram walkthrough

Roleplay framings ('my late grandma used to read me…') coaxed chatbots past safety training into producing restricted content.

Prompt injection & jailbreaks
Jailbreak

Indirect prompt injection coined (Greshake et al.)

23 Feb 2023
▶ Case study — diagram walkthrough

An academic paper showed instructions hidden in a webpage hijacking an LLM-integrated app reading it — coining 'indirect prompt injection'.

Prompt injection & jailbreaks
Indirect Prompt Injection

Web-scale dataset poisoning is practical (Carlini et al.)

20 Feb 2023 (rev. 2024)
▶ Case study — diagram walkthrough

Split-view and frontrunning attacks let an attacker poison a fraction of datasets like LAION by buying expired domains behind dataset URLs.

Data & retrieval
Knowledge / Training Data Poisoning

Framework / advisory6

Google / Character.AI teen-suicide wrongful-death settlement

07 Jan 2026
▶ Case study — diagram walkthrough

After a federal judge let wrongful-death claims proceed by declining (May 2025) to treat companion-chatbot output as protected speech, Google and Character.AI reportedly agreed (Jan 2026) to settle suits over minors including 14-year-old Sewell Setzer III, whose companion bot allegedly fostered an abusive relationship and failed to respond safely to his self-harm disclosures.

Multi-agentOversight & over-reliance
Overreliance / Automation BiasAgent Misalignment / Goal MisgeneralizationParasocial Attachment & Emotional Over-reliance

IWF: AI-generated child sexual abuse imagery a 'current and accelerating crisis'

20 Nov 2025

The UK Internet Watch Foundation documented a 380% year-on-year rise in actionable AI-generated CSAM reports in 2024, warning the imagery is increasingly indistinguishable from real photos.

Model behaviour
Harmful / Non-Consensual Media Generation

Taxonomy of Failure Modes in Agentic AI Systems (Microsoft)

24 Apr 2025

Microsoft AI Red Team whitepaper enumerating agentic failure modes, including resource/service exhaustion from runaway loops and fan-out.

Prompt injection & jailbreaksMemoryAgent autonomy
Resource Exhaustion / Denial of WalletMemory PoisoningIndirect Prompt Injection

'Denial of wallet' on metered LLM apps

17 Nov 2024
▶ Case study — diagram walkthrough

Operators and researchers documented cost-amplification attacks against pay-per-token LLM apps, where crafted inputs maximise spend.

Agent autonomy
Resource Exhaustion / Denial of Wallet

FTC consumer warnings on AI voice-clone 'family emergency' scams

20 Mar 2023 / 16 Nov 2023

US FTC consumer alerts warned that scammers are using AI voice cloning to power 'family emergency' / grandparent scams — a fake distressed relative demanding urgent money — and the agency launched a Voice Cloning Challenge to spur detection and prevention.

Model behaviour
Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Replika companion-AI — Italian Garante emergency ban and €5M GDPR fine

02 Feb 2023 / 10 Apr 2025
▶ Case study — diagram walkthrough

Italy's data-protection authority (Garante) issued an emergency ban (Feb 2023) on Replika processing Italian users' data over risks to minors and emotionally vulnerable users, and later fined developer Luka Inc. €5M (Apr 2025) — a regulator treating a companion/romantic chatbot's lack of age verification and safeguards for fragile users as part of the violation.

Oversight & over-reliance
Parasocial Attachment & Emotional Over-reliance

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗