Definition
Models produce outputs that are not grounded on any source content or convincingly contradict the source content due to a lack of understanding of real-world views. This can misinform or mislead users and reduce public faith in the reliability of AI systems.
Interactive deep-dive
This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.
Controls & guardrails that address this
15Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.
Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.
Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.
Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.
Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.
Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.
Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.
Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.
source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.
source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input ValidationDefine and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.
Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.
Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.
Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.
Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.
source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.
source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)Real-world cases
4Actual published events that illustrate this risk โ click through for the writeup and sources.
A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.
Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned โ the canonical hallucination + overreliance failure.
Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names โ a tactic dubbed 'slopsquatting'.