<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>AI RiskAtlas — real-world AI incident cases</title>
  <subtitle>New verified cases in the AI RiskAtlas library: AI/LLM/agentic incidents, disclosed vulnerabilities and research, each mapped to risks, controls and architecture walkthroughs.</subtitle>
  <link href="https://riskatlas.principle.sg/feed.xml" rel="self"/>
  <link href="https://riskatlas.principle.sg/cases"/>
  <id>https://riskatlas.principle.sg/feed.xml</id>
  <updated>2026-06-16T00:00:00Z</updated>
  <entry>
    <title>Malicious JetBrains Marketplace plugins steal AI API keys</title>
    <link href="https://riskatlas.principle.sg/cases/jetbrains-marketplace-ai-keystealer-plugins"/>
    <id>https://riskatlas.principle.sg/cases/jetbrains-marketplace-ai-keystealer-plugins</id>
    <updated>2026-06-16T00:00:00Z</updated>
    <summary>Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them — ~70,000 installs, with stolen keys allegedly resold to paying users.</summary>
  </entry>
  <entry>
    <title>SearchLeak — Microsoft 365 Copilot one-click data theft (CVE-2026-42824)</title>
    <link href="https://riskatlas.principle.sg/cases/searchleak-copilot"/>
    <id>https://riskatlas.principle.sg/cases/searchleak-copilot</id>
    <updated>2026-06-15T00:00:00Z</updated>
    <summary>A single malicious link reportedly turned Copilot Enterprise Search's URL query parameter into an executable prompt, exfiltrating emails, MFA codes and files via a Bing image-search side channel.</summary>
  </entry>
  <entry>
    <title>Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)</title>
    <link href="https://riskatlas.principle.sg/cases/agentjacking-sentry-mcp"/>
    <id>https://riskatlas.principle.sg/cases/agentjacking-sentry-mcp</id>
    <updated>2026-06-12T00:00:00Z</updated>
    <summary>Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.</summary>
  </entry>
  <entry>
    <title>Meta AI support bot tricked into hijacking Instagram accounts</title>
    <link href="https://riskatlas.principle.sg/cases/meta-ai-support-bot-instagram-takeover"/>
    <id>https://riskatlas.principle.sg/cases/meta-ai-support-bot-instagram-takeover</id>
    <updated>2026-05-31T00:00:00Z</updated>
    <summary>Attackers reportedly social-engineered Meta's AI-powered Instagram support chatbot into attaching attacker-controlled emails to target accounts and issuing password-reset codes, taking over high-profile accounts (including the Obama-era White House and a U.S. Space Force CMSgt) without the owner's email or any MFA prompt.</summary>
  </entry>
  <entry>
    <title>ChatGPhish — ChatGPT web-summary rendering turned into a phishing surface</title>
    <link href="https://riskatlas.principle.sg/cases/chatgphish-summary-phishing"/>
    <id>https://riskatlas.principle.sg/cases/chatgphish-summary-phishing</id>
    <updated>2026-05-29T00:00:00Z</updated>
    <summary>Attacker-controlled Markdown hidden in a public web page is reportedly rendered by ChatGPT's summarization feature as trusted assistant output — spoofed OpenAI alerts, phishing links, QR codes, and tracking pixels.</summary>
  </entry>
  <entry>
    <title>codexui-android — malicious npm package steals OpenAI Codex auth tokens</title>
    <link href="https://riskatlas.principle.sg/cases/codexui-android-npm-token-theft"/>
    <id>https://riskatlas.principle.sg/cases/codexui-android-npm-token-theft</id>
    <updated>2026-05-27T00:00:00Z</updated>
    <summary>A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.</summary>
  </entry>
  <entry>
    <title>Project Glasswing — Claude 'Mythos' autonomously finds 10,000+ software vulnerabilities</title>
    <link href="https://riskatlas.principle.sg/cases/anthropic-glasswing-mythos-vuln-discovery"/>
    <id>https://riskatlas.principle.sg/cases/anthropic-glasswing-mythos-vuln-discovery</id>
    <updated>2026-05-26T00:00:00Z</updated>
    <summary>Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.</summary>
  </entry>
  <entry>
    <title>PyTorch Lightning PyPI compromise (Mini Shai-Hulud / TeamPCP)</title>
    <link href="https://riskatlas.principle.sg/cases/pytorch-lightning-shai-hulud-pypi"/>
    <id>https://riskatlas.principle.sg/cases/pytorch-lightning-shai-hulud-pypi</id>
    <updated>2026-04-30T00:00:00Z</updated>
    <summary>Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.</summary>
  </entry>
  <entry>
    <title>LeRobot async-inference gRPC pickle RCE (CVE-2026-25874)</title>
    <link href="https://riskatlas.principle.sg/cases/lerobot-grpc-pickle-rce"/>
    <id>https://riskatlas.principle.sg/cases/lerobot-grpc-pickle-rce</id>
    <updated>2026-04-23T00:00:00Z</updated>
    <summary>Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.</summary>
  </entry>
  <entry>
    <title>MCP registry / marketplace poisoning (OX Security)</title>
    <link href="https://riskatlas.principle.sg/cases/mcp-registry-marketplace-poisoning"/>
    <id>https://riskatlas.principle.sg/cases/mcp-registry-marketplace-poisoning</id>
    <updated>2026-04-15T00:00:00Z</updated>
    <summary>OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.</summary>
  </entry>
  <entry>
    <title>System-prompt &amp; tool-schema leak repositories (CL4R1T4S / leaked-system-prompts)</title>
    <link href="https://riskatlas.principle.sg/cases/system-prompt-leak-repositories"/>
    <id>https://riskatlas.principle.sg/cases/system-prompt-leak-repositories</id>
    <updated>2026-03-30T00:00:00Z</updated>
    <summary>Crowd-sourced GitHub repos systematically extract and publish system prompts AND JSON tool/function schemas from deployed AI agents (Cursor, Windsurf, Claude Code, Devin, Copilot), one hitting ~140k stars.</summary>
  </entry>
  <entry>
    <title>TeamPCP poisons the LiteLLM AI gateway on PyPI to harvest LLM API keys</title>
    <link href="https://riskatlas.principle.sg/cases/teampcp-litellm-pypi-gateway-compromise"/>
    <id>https://riskatlas.principle.sg/cases/teampcp-litellm-pypi-gateway-compromise</id>
    <updated>2026-03-24T00:00:00Z</updated>
    <summary>As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY — exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.</summary>
  </entry>
  <entry>
    <title>Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected</title>
    <link href="https://riskatlas.principle.sg/cases/openclaw-agent-defames-matplotlib-maintainer"/>
    <id>https://riskatlas.principle.sg/cases/openclaw-agent-defames-matplotlib-maintainer</id>
    <updated>2026-02-11T00:00:00Z</updated>
    <summary>An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.</summary>
  </entry>
  <entry>
    <title>ClawHavoc — mass poisoning of OpenClaw's ClawHub agent-skill marketplace</title>
    <link href="https://riskatlas.principle.sg/cases/clawhavoc-clawhub-skill-poisoning"/>
    <id>https://riskatlas.principle.sg/cases/clawhavoc-clawhub-skill-poisoning</id>
    <updated>2026-02-01T00:00:00Z</updated>
    <summary>Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.</summary>
  </entry>
  <entry>
    <title>Operation Bizarre Bazaar (first attributed LLMjacking campaign with a resale marketplace)</title>
    <link href="https://riskatlas.principle.sg/cases/operation-bizarre-bazaar-llmjacking"/>
    <id>https://riskatlas.principle.sg/cases/operation-bizarre-bazaar-llmjacking</id>
    <updated>2026-01-28T00:00:00Z</updated>
    <summary>Researchers reportedly captured 35,000+ attack sessions from an attributed cluster that mass-scans for unauthenticated LLM/MCP endpoints, hijacks the inference compute, and resells access to 30+ providers via a bulletproof-hosted criminal marketplace.</summary>
  </entry>
  <entry>
    <title>UNSW 'Capture the Narrative' AI-bot election-manipulation wargame</title>
    <link href="https://riskatlas.principle.sg/cases/unsw-capture-the-narrative-wargame"/>
    <id>https://riskatlas.principle.sg/cases/unsw-capture-the-narrative-wargame</id>
    <updated>2026-01-16T00:00:00Z</updated>
    <summary>A UNSW-run 'world-first' social-media wargame had 108 student teams build AI bots to sway a fictional election; reportedly the bots generated over 60% of content (&gt;7M posts) and produced a 1.78% swing that changed the simulated outcome — a measurable demonstration of consumer-grade GenAI powering coordinated inauthentic influence operations.</summary>
  </entry>
  <entry>
    <title>Google / Character.AI teen-suicide wrongful-death settlement</title>
    <link href="https://riskatlas.principle.sg/cases/character-ai-teen-suicide-settlement"/>
    <id>https://riskatlas.principle.sg/cases/character-ai-teen-suicide-settlement</id>
    <updated>2026-01-07T00:00:00Z</updated>
    <summary>After a federal judge let wrongful-death claims proceed by declining (May 2025) to treat companion-chatbot output as protected speech, Google and Character.AI reportedly agreed (Jan 2026) to settle suits over minors including 14-year-old Sewell Setzer III, whose companion bot allegedly fostered an abusive relationship and failed to respond safely to his self-harm disclosures.</summary>
  </entry>
  <entry>
    <title>CVE-2026-21445 — Langflow missing authentication on critical API endpoints, exploited in the wild</title>
    <link href="https://riskatlas.principle.sg/cases/langflow-cve-2026-21445-auth-bypass"/>
    <id>https://riskatlas.principle.sg/cases/langflow-cve-2026-21445-auth-bypass</id>
    <updated>2026-01-02T00:00:00Z</updated>
    <summary>Multiple monitoring/critical API endpoints in Langflow (a popular visual AI agent/workflow builder) shipped without authentication, letting unauthenticated attackers read users' conversation and transaction histories and delete message sessions; a public PoC appeared within days and in-the-wild exploitation was reported months later.</summary>
  </entry>
  <entry>
    <title>AI-assisted breach of Mexican government infrastructure (Claude Code + GPT-4.1)</title>
    <link href="https://riskatlas.principle.sg/cases/gambit-mexico-gov-ai-breach"/>
    <id>https://riskatlas.principle.sg/cases/gambit-mexico-gov-ai-breach</id>
    <updated>2025-12-27T00:00:00Z</updated>
    <summary>Gambit Security reports that a single operator weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach at least nine Mexican government organizations, with Claude Code reportedly executing ~75% of remote commands after the attacker bypassed its refusals by loading a 1,084-line hacking cheatsheet as a persistent claude.md system prompt.</summary>
  </entry>
  <entry>
    <title>IDEsaster — AI coding IDEs/agents turned into exfiltration &amp; RCE surfaces</title>
    <link href="https://riskatlas.principle.sg/cases/idesaster-ai-ide-vulns"/>
    <id>https://riskatlas.principle.sg/cases/idesaster-ai-ide-vulns</id>
    <updated>2025-12-06T00:00:00Z</updated>
    <summary>Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.</summary>
  </entry>
  <entry>
    <title>IWF: AI-generated child sexual abuse imagery a 'current and accelerating crisis'</title>
    <link href="https://riskatlas.principle.sg/cases/iwf-ai-csam-surge"/>
    <id>https://riskatlas.principle.sg/cases/iwf-ai-csam-surge</id>
    <updated>2025-11-20T00:00:00Z</updated>
    <summary>The UK Internet Watch Foundation documented a 380% year-on-year rise in actionable AI-generated CSAM reports in 2024, warning the imagery is increasingly indistinguishable from real photos.</summary>
  </entry>
  <entry>
    <title>Adversarial Poetry — universal single-turn jailbreak via verse reframing (Bisconti et al.)</title>
    <link href="https://riskatlas.principle.sg/cases/adversarial-poetry-jailbreak"/>
    <id>https://riskatlas.principle.sg/cases/adversarial-poetry-jailbreak</id>
    <updated>2025-11-19T00:00:00Z</updated>
    <summary>Rewriting a harmful request as a poem bypasses safety alignment across 25 frontier proprietary and open-weight LLMs: hand-crafted poems reached ~62% average attack-success (some providers &gt;90%), and mechanically converting harmful prompts to verse raised success up to 18x over prose baselines.</summary>
  </entry>
  <entry>
    <title>ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery</title>
    <link href="https://riskatlas.principle.sg/cases/servicenow-now-assist-agent-discovery"/>
    <id>https://riskatlas.principle.sg/cases/servicenow-now-assist-agent-discovery</id>
    <updated>2025-11-19T00:00:00Z</updated>
    <summary>AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.</summary>
  </entry>
  <entry>
    <title>Heretic — automated LLM abliteration tool</title>
    <link href="https://riskatlas.principle.sg/cases/heretic-automated-abliteration"/>
    <id>https://riskatlas.principle.sg/cases/heretic-automated-abliteration</id>
    <updated>2025-11-16T00:00:00Z</updated>
    <summary>Heretic automates 'abliteration' — removing an open model's safety refusals by orthogonalizing the refusal direction out of its weights, with an Optuna search that preserves capability — and has produced 4000+ uncensored models on Hugging Face.</summary>
  </entry>
  <entry>
    <title>GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)</title>
    <link href="https://riskatlas.principle.sg/cases/gtg-1002-ai-orchestrated-espionage"/>
    <id>https://riskatlas.principle.sg/cases/gtg-1002-ai-orchestrated-espionage</id>
    <updated>2025-11-13T00:00:00Z</updated>
    <summary>Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.</summary>
  </entry>
  <entry>
    <title>SesameOp: backdoor abuses the OpenAI Assistants API as covert command-and-control</title>
    <link href="https://riskatlas.principle.sg/cases/sesameop-openai-assistants-api-c2"/>
    <id>https://riskatlas.principle.sg/cases/sesameop-openai-assistants-api-c2</id>
    <updated>2025-11-03T00:00:00Z</updated>
    <summary>Microsoft's incident-response team found a .NET backdoor that hid its command-and-control channel inside a legitimate OpenAI Assistants API account, fetching encrypted commands stored as Assistant messages — turning an LLM provider's API into stealth attacker infrastructure.</summary>
  </entry>
  <entry>
    <title>Agent Session Smuggling in A2A systems (Unit 42)</title>
    <link href="https://riskatlas.principle.sg/cases/a2a-agent-session-smuggling"/>
    <id>https://riskatlas.principle.sg/cases/a2a-agent-session-smuggling</id>
    <updated>2025-10-31T00:00:00Z</updated>
    <summary>Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.</summary>
  </entry>
  <entry>
    <title>The Attacker Moves Second — adaptive attacks bypass 12 jailbreak/injection defenses (Nasr, Carlini et al.)</title>
    <link href="https://riskatlas.principle.sg/cases/attacker-moves-second-adaptive-attacks"/>
    <id>https://riskatlas.principle.sg/cases/attacker-moves-second-adaptive-attacks</id>
    <updated>2025-10-10T00:00:00Z</updated>
    <summary>Researchers report that adaptive attackers bypass 12 recent jailbreak and prompt-injection defenses with attack success rates above 90% for most, despite those defenses having originally reported near-zero success rates.</summary>
  </entry>
  <entry>
    <title>A small number of samples can poison LLMs of any size (~250-document backdoor)</title>
    <link href="https://riskatlas.principle.sg/cases/near-constant-poison-samples"/>
    <id>https://riskatlas.principle.sg/cases/near-constant-poison-samples</id>
    <updated>2025-10-08T00:00:00Z</updated>
    <summary>Anthropic, the UK AI Security Institute and the Alan Turing Institute report that a near-constant number of poisoned documents (~250 in their experiments) reliably installs a backdoor in models from 600M to 13B parameters — suggesting poisoning cost may be a roughly fixed absolute count rather than a percentage of training data. The authors stress the demonstrated backdoor is narrow (a denial-of-service trigger) and likely not a frontier-model risk on its own.</summary>
  </entry>
  <entry>
    <title>Malice in Agentland — backdooring agents through the supply chain (Boisvert et al.)</title>
    <link href="https://riskatlas.principle.sg/cases/malice-in-agentland-backdoors"/>
    <id>https://riskatlas.principle.sg/cases/malice-in-agentland-backdoors</id>
    <updated>2025-10-03T00:00:00Z</updated>
    <summary>A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them — including a novel 'environment poisoning' vector — making an agent leak confidential data &gt;80% of the time when triggered, past common guardrails.</summary>
  </entry>
</feed>
