#24

Hallucination / Fabrication / Confabulation

Risk taxonomy

Definition

Models produce outputs that are not grounded on any source content or convincingly contradict the source content due to a lack of understanding of real-world views. This can misinform or mislead users and reduce public faith in the reliability of AI systems.

Interactive deep-dive

This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.

▶ Hallucination →

🌀 The Refund That Never Existed

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 10

RAG

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

Small model selection

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

Lifecycle stage1 – Use Case Context & Design

System prompt design

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

Lifecycle stage3 – Onboarding, Build & Review

Fine-tuning

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesModel Drift & Silent Degradation

Programmable conversation controls

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

Lifecycle stage4 – Deployment

Also addressesModel Drift & Silent Degradation

Hallucination rate thresholds and grounding policy

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

Lifecycle stage1 – Use Case Context & Design

Human-in-the-loop validation

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Uncertainty-quantified abstention via self-consistency / semantic entropy

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Tool-grounded facts for agents (no free-text fabrication of structured data)

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Citation/attribution verification against retrieved sources

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input Validation

Lifecycle stage4 – Deployment

Detective · 3

Robustness testing

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Synthetic evaluation datasets

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

Runtime faithfulness/groundedness scoring with abstain gate

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)

Lifecycle stage3 – Onboarding, Build & Review

Corrective · 4

Reinforcement learning

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesOverreliance / Automation Bias Model Drift & Silent Degradation

User-facing disclosure of hallucination risk

Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.

Lifecycle stage4 – Deployment

Runtime faithfulness/groundedness scoring with abstain gate

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)

Lifecycle stage4 – Deployment

Uncertainty-quantified abstention via self-consistency / semantic entropy

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)

Lifecycle stage4 – Deployment

Open these in the Control Library →

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Air Canada chatbot refund-policy ruling2024

A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.

Mata v. Avianca — fabricated case citations2023

Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.

GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)2025

Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.

Slopsquatting — package hallucinations by code-generating LLMs2025

A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.

Browse all real-world cases →

Other risks in Robustness & Stability

#25 Overconfidence #26 Training data or inputs not fit for purpose #27 Lack of continuous monitoring #28 Insufficient data quality #29 Model staleness #30 Insufficient model accuracy / soundness #31 Model degradation from unexpected use #32 Inadequate operational resilience #33 Unmet architectural requirements #34 Lack of reproducibility #44 Disruption to connected systems