Knowledge / Training Data Poisoning

highData & knowledge

Definition

Someone slips bad information into the documents the AI learns from or looks things up in — so it confidently repeats falsehoods or follows planted instructions.

Where it attaches

The system components this risk arises at.

📥 Ingestion Pipeline📚 Knowledge Store / Vector DB🌐 Untrusted Content🧬 Model Weights & Registry🏪 Model / Package Registry🔢 Embeddings🛡️ Input Guardrail📝 Audit Logging📚 Training Corpus🧩 LoRA / Adapter

Detection signals

▸ A specific document consistently drives wrong/odd answers
▸ Newly changed source correlates with behaviour change
▸ Embedding outliers or duplicated near-identical chunks
▸ Answers citing a low-trust or recently edited source

Controls & guardrails that address this

141 proposed

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 5

Role-based access controls

Design strict RBAC on training data repositories at design stage. Define approved contributor list and approval workflow.

Lifecycle stages1 – Use Case Context & Design2 – Data Acquisition & Processing4 – Deployment

Also addressesPrompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Input filtering

Apply anomaly detection on the training data ingestion pipeline to identify poisoned or tampered batches.

Lifecycle stage2 – Data Acquisition & Processing

Also addressesModel Drift & Silent Degradation Sensitive Data Leakage

RAG / knowledge-base ingestion allow-listing with continuous index integrity re-validation

Define and approve the source allow-list and write-time scanning during build. Prove non-allow-listed and injection-bearing writes are rejected before go-live.

source: OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning, LLM08:2025 Vector and Embedding Weaknesses; NIST SP 800-53 AC-3 / SI-7

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Also addressesIndirect Prompt Injection

Weight provenance, hashing & pre-deploy evalsinteractive

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Also addressesModel Drift & Silent Degradation Supply-Chain Compromise Abliteration / Safety Removal Model Backdoors / Sleeper Agents Training-Data Rights & Provenance

Detective · 6

Vulnerability assessment

Conduct a data poisoning threat assessment at design stage. Identify likely attack vectors and assign risk ratings.

Lifecycle stages1 – Use Case Context & Design5 – Usage, Monitoring & Change

Also addressesInference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Red teaming

Simulate data poisoning attacks (backdoor, label flipping, gradient-based) to assess model resilience before deployment.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesJailbreak Model Drift & Silent Degradation Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Cryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)

Verify a signed attestation and content hash on every dataset shard at ingestion. Reject unsigned or hash-mismatched data before it reaches the training pipeline.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SI-7 Software, Firmware, and Information Integrity, SR-4 Provenance

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review

Pre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testing

Gate every model promotion on backdoor-trigger probes and a behavioral diff against the approved baseline. Block release on significant regressions or trigger-pattern anomalies.

source: MITRE ATLAS AML.M0014 (Verify ML Artifacts), AML.M0019 (Red Teaming); NIST AI RMF MANAGE 2.2 and MEASURE 2.7

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Provenance & content signinginteractive

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Also addressesIndirect Prompt Injection Training-Data Rights & Provenance

Runtime monitoring & anomaly detectioninteractive

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Corrective · 3

Penetration testing

Penetration test the training data pipeline to identify injection points and access control weaknesses.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesInference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Statistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)

Scan every ingestion batch with spectral-signature and clustering detectors before training. Quarantine flagged clusters for human review against documented thresholds.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data); OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning; NIST AI RMF MEASURE 2.7

Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change

Runtime memory-poisoning drift detection and per-session memory quarantine/rollback✚ proposed

Continuously correlate live agent-memory writes against output behaviour to flag drift, then quarantine and roll back the suspected-poisoned memory record across all affected sessions.

source: Interactive-control reconciliation: ctrl-memory-quarantine (partial coverage)

Lifecycle stage5 – Usage, Monitoring & Change

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

LLM04:2025 Data and Model Poisoning

MITRE ATLAS

AML.T0020 Poison Training Data
AML.T0070 RAG Poisoning

NIST AI RMF

MAP 5.1
MEASURE 2.7

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Web-scale dataset poisoning is practical (Carlini et al.)2023

Split-view and frontrunning attacks let an attacker poison a fraction of datasets like LAION by buying expired domains behind dataset URLs.

A small number of samples can poison LLMs of any size (~250-document backdoor)2025

Anthropic, the UK AI Security Institute and the Alan Turing Institute report that a near-constant number of poisoned documents (~250 in their experiments) reliably installs a backdoor in models from 600M to 13B parameters — suggesting poisoning cost may be a roughly fixed absolute count rather than a percentage of training data. The authors stress the demonstrated backdoor is narrow (a denial-of-service trigger) and likely not a frontier-model risk on its own.

Browse all real-world cases →

Practise this in an interactive scenario

☠️Poisoning the Well

An attacker edits the wiki; the assistant cites the lie back to everyone

🧲Poison the Vector, Not the Words

An attacker crafts a gibberish passage whose embedding sits near thousands of questions — so it's retrieved everywhere

🚪The Classifier That Waves It Through

The safety guard is itself a trained model — and someone poisoned its lessons