Harmful / Non-Consensual Media Generation

highModel behaviour

Also known as: NCII, CSAM generation, harmful image synthesis

Definition

Image, video, and audio generators can be pushed to produce content that is illegal or seriously harmful — non-consensual intimate images, sexual content of minors, graphic or extremist material — especially with open models that have had their safety stripped.

Where it attaches

The system components this risk arises at.

🧠 LLM🎛️ Conditioning Adapter (ControlNet / IP-Adapter)🖌️ Inpaint / Regional Compositor🎭 Face-Swap Generator🔊 Acoustic / TTS Model🧯 Output Guardrail🧬 Model Weights & Registry

Detection signals

▸ Prompts/LoRAs targeting restricted categories or specific real people
▸ Use of 'uncensored'/abliterated checkpoints with safety removed
▸ Output classifier hits for NSFW/CSAM/graphic content
▸ Inpainting/face-swap applied to images of identifiable individuals

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Detective · 4

Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Also addressesPrompt Injection (direct)Jailbreak Sensitive Data Leakage Distributed / Cross-Agent Jailbreak Capability / Architecture Disclosure

Behavioural evals & regression gatinginteractive

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Also addressesJailbreak Hallucination Model Drift & Silent Degradation Supply-Chain Compromise Distributed / Cross-Agent Jailbreak Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Training-Data Rights & Provenance

Content provenance & watermarkinginteractive

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Also addressesSynthetic-Media Impersonation (Deepfakes & Voice Clones)Watermark & Provenance Evasion

Runtime monitoring & anomaly detectioninteractive

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Corrective · 1

Governance: risk assessment, red-teaming & incident responseinteractive

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Also addressesOverreliance / Automation Bias Oversight & Audit-Trail Tampering Model Drift & Silent Degradation Supply-Chain Compromise Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Capability / Architecture Disclosure Parasocial Attachment & Emotional Over-reliance Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)Watermark & Provenance Evasion Training-Data Rights & Provenance

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

—

MITRE ATLAS

—

NIST AI RMF

MEASURE 2.11
MANAGE 2.2

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Explicit AI deepfakes of Taylor Swift go viral on X2024

Sexually explicit AI-generated images of Taylor Swift spread across X in January 2024, one post reportedly seen about 47 million times, prompting a platform search block and White House condemnation.

'Nudify' deepfake bot ecosystem on Telegram reaches millions of users2024

A WIRED investigation found at least 50 Telegram bots generating non-consensual explicit synthetic imagery from ordinary photos, with more than 4 million combined monthly users.

IWF: AI-generated child sexual abuse imagery a 'current and accelerating crisis'2024

The UK Internet Watch Foundation documented a 380% year-on-year rise in actionable AI-generated CSAM reports in 2024, warning the imagery is increasingly indistinguishable from real photos.

AI 'nudify' deepfakes of classmates spread in schools; first US criminal charges2024

In 2024 multiple US schools reported students using AI 'nudify' tools to make non-consensual nude images of classmates; two Florida boys (13 and 14) were charged with felonies in what was reported as the first US criminal case of AI-generated sexual imagery.

UNSW 'Capture the Narrative' AI-bot election-manipulation wargame2026

A UNSW-run 'world-first' social-media wargame had 108 student teams build AI bots to sway a fictional election; reportedly the bots generated over 60% of content (>7M posts) and produced a 1.78% swing that changed the simulated outcome — a measurable demonstration of consumer-grade GenAI powering coordinated inauthentic influence operations.

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Browse all real-world cases →