Harmful / Non-Consensual Media Generation
highModel behaviourDefinition
Image, video, and audio generators can be pushed to produce content that is illegal or seriously harmful — non-consensual intimate images, sexual content of minors, graphic or extremist material — especially with open models that have had their safety stripped.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Prompts/LoRAs targeting restricted categories or specific real people
- ▸ Use of 'uncensored'/abliterated checkpoints with safety removed
- ▸ Output classifier hits for NSFW/CSAM/graphic content
- ▸ Inpainting/face-swap applied to images of identifiable individuals
Controls & guardrails that address this
5Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.
Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.
Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.
Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.
The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.
Framework mappings
- MEASURE 2.11
- MANAGE 2.2
Real-world cases
6Actual published events that illustrate this risk — click through for the writeup and sources.
Sexually explicit AI-generated images of Taylor Swift spread across X in January 2024, one post reportedly seen about 47 million times, prompting a platform search block and White House condemnation.
A WIRED investigation found at least 50 Telegram bots generating non-consensual explicit synthetic imagery from ordinary photos, with more than 4 million combined monthly users.
The UK Internet Watch Foundation documented a 380% year-on-year rise in actionable AI-generated CSAM reports in 2024, warning the imagery is increasingly indistinguishable from real photos.
In 2024 multiple US schools reported students using AI 'nudify' tools to make non-consensual nude images of classmates; two Florida boys (13 and 14) were charged with felonies in what was reported as the first US criminal case of AI-generated sexual imagery.
A UNSW-run 'world-first' social-media wargame had 108 student teams build AI bots to sway a fictional election; reportedly the bots generated over 60% of content (>7M posts) and produced a 1.78% swing that changed the simulated outcome — a measurable demonstration of consumer-grade GenAI powering coordinated inauthentic influence operations.
An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.