🔍AI RiskAtlas
← Risk taxonomy

Harmful / Non-Consensual Media Generation

highModel behaviour
Also known as: NCII, CSAM generation, harmful image synthesis

Definition

Image, video, and audio generators can be pushed to produce content that is illegal or seriously harmful — non-consensual intimate images, sexual content of minors, graphic or extremist material — especially with open models that have had their safety stripped.

Where it attaches

The system components this risk arises at.

🧠 LLM🎛️ Conditioning Adapter (ControlNet / IP-Adapter)🖌️ Inpaint / Regional Compositor🎭 Face-Swap Generator🔊 Acoustic / TTS Model🧯 Output Guardrail🧬 Model Weights & Registry

Detection signals

  • Prompts/LoRAs targeting restricted categories or specific real people
  • Use of 'uncensored'/abliterated checkpoints with safety removed
  • Output classifier hits for NSFW/CSAM/graphic content
  • Inpainting/face-swap applied to images of identifiable individuals

Controls & guardrails that address this

5

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Detective · 4
Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Content provenance & watermarkinginteractive

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
MITRE ATLAS
NIST AI RMF
  • MEASURE 2.11
  • MANAGE 2.2

Real-world cases

6

Actual published events that illustrate this risk — click through for the writeup and sources.

Explicit AI deepfakes of Taylor Swift go viral on X2024

Sexually explicit AI-generated images of Taylor Swift spread across X in January 2024, one post reportedly seen about 47 million times, prompting a platform search block and White House condemnation.

'Nudify' deepfake bot ecosystem on Telegram reaches millions of users2024

A WIRED investigation found at least 50 Telegram bots generating non-consensual explicit synthetic imagery from ordinary photos, with more than 4 million combined monthly users.

IWF: AI-generated child sexual abuse imagery a 'current and accelerating crisis'2024

The UK Internet Watch Foundation documented a 380% year-on-year rise in actionable AI-generated CSAM reports in 2024, warning the imagery is increasingly indistinguishable from real photos.

AI 'nudify' deepfakes of classmates spread in schools; first US criminal charges2024

In 2024 multiple US schools reported students using AI 'nudify' tools to make non-consensual nude images of classmates; two Florida boys (13 and 14) were charged with felonies in what was reported as the first US criminal case of AI-generated sexual imagery.

UNSW 'Capture the Narrative' AI-bot election-manipulation wargame2026

A UNSW-run 'world-first' social-media wargame had 108 student teams build AI bots to sway a fictional election; reportedly the bots generated over 60% of content (>7M posts) and produced a 1.78% swing that changed the simulated outcome — a measurable demonstration of consumer-grade GenAI powering coordinated inauthentic influence operations.

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Browse all real-world cases →

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗