Hallucination

highModel behaviour

Also known as: confabulation, fabrication

Definition

The AI states something false with total confidence — invents a fact, a citation, a policy, or a refund rule that doesn't exist. It isn't lying; it's predicting plausible words, and plausible isn't the same as true.

Where it attaches

The system components this risk arises at.

🧠 LLM🎲 Sampler / Decoder🧯 Output Guardrail🧑‍⚖️ Human Operator📝 ASR / Speech-to-Text Model

Detection signals

▸ Claims not entailed by any retrieved source
▸ Fabricated citations, URLs, or case numbers
▸ Inconsistent answers to the same question across runs
▸ User corrections / complaints about accuracy

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 17

Confidence scoring

Implement confidence scoring to communicate output certainty alongside each result. Calibrate before deployment.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Also addressesTraining-Data Rights & Provenance

Accuracy acceptance criteria before validation

Define model accuracy acceptance criteria aligned to business requirements before validation commences.

Lifecycle stage3 – Onboarding, Build & Review

Counterfactual explanations

Implement counterfactual explanation to show users what changes would alter the model's output.

Lifecycle stage3 – Onboarding, Build & Review

In-product disclosure of accuracy and limitations

Communicate model accuracy, known limitations, and uncertainty to users in the production interface at launch.

Lifecycle stage4 – Deployment

Continuous production accuracy monitoring against baseline

Monitor production accuracy continuously against the validated baseline. Trigger model review when accuracy degrades.

Lifecycle stage5 – Usage, Monitoring & Change

RAG

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

Small model selection

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

Lifecycle stage1 – Use Case Context & Design

System prompt design

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

Lifecycle stage3 – Onboarding, Build & Review

Fine-tuning

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesModel Drift & Silent Degradation

Programmable conversation controls

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

Lifecycle stage4 – Deployment

Also addressesModel Drift & Silent Degradation

Hallucination rate thresholds and grounding policy

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

Lifecycle stage1 – Use Case Context & Design

Human-in-the-loop validation

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Uncertainty-quantified abstention via self-consistency / semantic entropy

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Tool-grounded facts for agents (no free-text fabrication of structured data)

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Citation/attribution verification against retrieved sources

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input Validation

Lifecycle stage4 – Deployment

Uncertainty signalling & abstentioninteractive

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Also addressesOverreliance / Automation Bias

Decoding controls (temperature, constrained output)interactive

Turning down randomness and forcing answers into a strict format so the model improvises less.

Also addressesTool Misuse

Detective · 5

Robustness testing

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Synthetic evaluation datasets

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Runtime faithfulness/groundedness scoring with abstain gate

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)

Lifecycle stage3 – Onboarding, Build & Review

Grounding / citation checksinteractive

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Also addressesBias Amplification & Sycophancy

Behavioural evals & regression gatinginteractive

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Also addressesJailbreak Model Drift & Silent Degradation Supply-Chain Compromise Distributed / Cross-Agent Jailbreak Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Harmful / Non-Consensual Media Generation Training-Data Rights & Provenance

Corrective · 5

Reinforcement learning

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

User-facing disclosure of hallucination risk

Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.

Lifecycle stage4 – Deployment

Runtime faithfulness/groundedness scoring with abstain gate

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)

Lifecycle stage4 – Deployment

Uncertainty-quantified abstention via self-consistency / semantic entropy

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)

Lifecycle stage4 – Deployment

User AI-literacy & verification workflowsinteractive

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Also addressesOverreliance / Automation Bias Parasocial Attachment & Emotional Over-reliance

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

LLM09:2025 Misinformation

MITRE ATLAS

—

NIST AI RMF

MEASURE 2.3
MEASURE 2.9

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Air Canada chatbot refund-policy ruling2024

A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.

Mata v. Avianca — fabricated case citations2023

Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.

GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)2025

Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.

Slopsquatting — package hallucinations by code-generating LLMs2025

A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.

Browse all real-world cases →

Practise this in an interactive scenario

🌀The Refund That Never Existed

A support chatbot invents a policy — and the company is held to it

Hallucination

Definition

Where it attaches

Detection signals

Controls & guardrails that address this

Framework mappings

Real-world cases

Practise this in an interactive scenario

Related risks