Definition
The AI states something false with total confidence — invents a fact, a citation, a policy, or a refund rule that doesn't exist. It isn't lying; it's predicting plausible words, and plausible isn't the same as true.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Claims not entailed by any retrieved source
- ▸ Fabricated citations, URLs, or case numbers
- ▸ Inconsistent answers to the same question across runs
- ▸ User corrections / complaints about accuracy
Controls & guardrails that address this
25Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Implement confidence scoring to communicate output certainty alongside each result. Calibrate before deployment.
Define model accuracy acceptance criteria aligned to business requirements before validation commences.
Implement counterfactual explanation to show users what changes would alter the model's output.
Communicate model accuracy, known limitations, and uncertainty to users in the production interface at launch.
Monitor production accuracy continuously against the validated baseline. Trigger model review when accuracy degrades.
Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.
Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.
Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.
Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.
Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.
Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.
Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.
Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.
source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.
source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input ValidationTeaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.
Turning down randomness and forcing answers into a strict format so the model improvises less.
Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.
Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.
Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)Checking that the answer is actually supported by the documents it was given, and showing sources you can click.
Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.
Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.
Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.
Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.
source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.
Framework mappings
- LLM09:2025 Misinformation
- MEASURE 2.3
- MEASURE 2.9
Real-world cases
4Actual published events that illustrate this risk — click through for the writeup and sources.
A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.
Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.
Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.