๐Ÿ”AI RiskAtlas
โ† Risk Taxonomy
#24

Hallucination / Fabrication / Confabulation

Risk taxonomy

Definition

Models produce outputs that are not grounded on any source content or convincingly contradict the source content due to a lack of understanding of real-world views. This can misinform or mislead users and reduce public faith in the reliability of AI systems.

Interactive deep-dive

This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.

Controls & guardrails that address this

15

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive ยท 10
RAG

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

Lifecycle stages1 โ€“ Use Case Context & Design3 โ€“ Onboarding, Build & Review
Small model selection

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

Lifecycle stage1 โ€“ Use Case Context & Design
System prompt design

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

Lifecycle stage3 โ€“ Onboarding, Build & Review
Fine-tuning

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

Lifecycle stage3 โ€“ Onboarding, Build & Review
Programmable conversation controls

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

Lifecycle stage4 โ€“ Deployment
Hallucination rate thresholds and grounding policy

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

Lifecycle stage1 โ€“ Use Case Context & Design
Human-in-the-loop validation

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

Lifecycle stages3 โ€“ Onboarding, Build & Review5 โ€“ Usage, Monitoring & Change
Uncertainty-quantified abstention via self-consistency / semantic entropy

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)
Lifecycle stages3 โ€“ Onboarding, Build & Review5 โ€“ Usage, Monitoring & Change
Tool-grounded facts for agents (no free-text fabrication of structured data)

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10
Lifecycle stages3 โ€“ Onboarding, Build & Review4 โ€“ Deployment
Citation/attribution verification against retrieved sources

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input Validation
Lifecycle stage4 โ€“ Deployment
Detective ยท 3
Robustness testing

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

Lifecycle stages3 โ€“ Onboarding, Build & Review5 โ€“ Usage, Monitoring & Change
Synthetic evaluation datasets

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

Lifecycle stage3 โ€“ Onboarding, Build & Review
Runtime faithfulness/groundedness scoring with abstain gate

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)
Lifecycle stage3 โ€“ Onboarding, Build & Review
Corrective ยท 4
Reinforcement learning

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

Lifecycle stage5 โ€“ Usage, Monitoring & Change
User-facing disclosure of hallucination risk

Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.

Lifecycle stage4 โ€“ Deployment
Runtime faithfulness/groundedness scoring with abstain gate

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)
Lifecycle stage4 โ€“ Deployment
Uncertainty-quantified abstention via self-consistency / semantic entropy

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)
Lifecycle stage4 โ€“ Deployment
Open these in the Control Library โ†’

Other risks in Robustness & Stability

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—