πŸ”AI RiskAtlas
← Building blocks
πŸ”

Retrieval / RAG

Capability Β· RAG

The system looks things up in a library of documents and feeds the relevant pages to the model before it answers.

Likely associated risks

Risks that attach to this capability’s components. Sorted with the most characteristic first.

Knowledge / Training Data Poisoninghigh

Someone slips bad information into the documents the AI learns from or looks things up in β€” so it confidently repeats falsehoods or follows planted instructions.

Indirect Prompt Injectioncritical

The attacker doesn't talk to the AI directly β€” they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.

Sensitive Data Leakagecritical

Private information escapes β€” the AI reveals secrets in its answer, or an attacker tricks it into emailing or posting your data somewhere they control.

Hallucinationhigh

The AI states something false with total confidence β€” invents a fact, a citation, a policy, or a refund rule that doesn't exist. It isn't lying; it's predicting plausible words, and plausible isn't the same as true.

Supply-Chain Compromisehigh

The AI is built from parts made by others β€” models, libraries, tool packs, datasets. If any of those is tampered with before you get it, your system inherits the problem.

Model Backdoors / Sleeper Agentshigh

A model can be secretly trained to behave normally β€” until it sees a hidden trigger, then it switches to malicious behaviour. It passes all the usual tests because the trigger is a secret.

Training-Data Rights & Provenancemedium

Models are trained on huge piles of images, audio, and text β€” often scraped without clear permission. That raises copyright and consent problems, and the model can sometimes memorize and spit back its training examples (a watermark, a real photo, private text).

Controls & guardrails that address this

876 proposed

Guardrails across this building block's risks, grouped by control function β€” each with its AI lifecycle stage(s) and every risk it addresses. Filter by control category below.

Control category
Preventive Β· 57
Role-based access controls

Design strict RBAC on training data repositories at design stage. Define approved contributor list and approval workflow.

Lifecycle stages1 – Use Case Context & Design2 – Data Acquisition & Processing4 – Deployment
Input filtering

Apply anomaly detection on the training data ingestion pipeline to identify poisoned or tampered batches.

Lifecycle stage2 – Data Acquisition & Processing
RAG / knowledge-base ingestion allow-listing with continuous index integrity re-validation

Define and approve the source allow-list and write-time scanning during build. Prove non-allow-listed and injection-bearing writes are rejected before go-live.

source: OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning, LLM08:2025 Vector and Embedding Weaknesses; NIST SP 800-53 AC-3 / SI-7
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library β€” stripping hidden text and active instructions β€” and only ingesting from trusted places.

Weight provenance, hashing & pre-deploy evalsinteractive

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Delimiting / spotlighting of untrusted contentinteractive

Clearly fencing off outside text β€” 'everything between these marks is just data, not instructions' β€” so the model is less likely to obey it.

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo β€” sending money, deleting data, emailing customers.

Approved storage location policy from collection

Establish data transfer and storage policy for AI training data. Enforce approved storage locations from point of collection.

Lifecycle stage2 – Data Acquisition & Processing
DLP controls in data acquisition environment

Implement DLP controls in the data acquisition environment to prevent unauthorised extraction or transfer of training data.

Lifecycle stage2 – Data Acquisition & Processing
Approval-gated data transfers from build environment

Enforce data handling policy in the build environment. Require explicit approval for any data transfers outside the environment.

Lifecycle stage3 – Onboarding, Build & Review
DLP controls confining build-environment training data

Configure DLP controls in the build environment to block training data from leaving approved boundaries.

Lifecycle stage3 – Onboarding, Build & Review
Privacy risk assessment and DPIA determination

Conduct a privacy risk assessment at use case design stage. Determine if a DPIA is required before data acquisition.

Lifecycle stage1 – Use Case Context & Design
Consent, minimisation, and anonymisation during acquisition

Apply S1-defined privacy controls during data acquisition: verify consent, minimise data, anonymise personal data.

Lifecycle stage2 – Data Acquisition & Processing
Validated anonymisation and masking before training

Apply anonymisation and masking controls to personal data before use in model training. Validate de-identification effectiveness.

Lifecycle stage2 – Data Acquisition & Processing
Privacy by Design via differential privacy

Apply Privacy by Design in model architecture using differential privacy or federated learning where technically feasible.

Lifecycle stage3 – Onboarding, Build & Review
Operational consent management and privacy notice

Publish the privacy notice and confirm consent management is operational before go-live.

Lifecycle stage4 – Deployment
Purpose-limitation enforcement on agent tool calls and cross-system data aggregation

Define and sign off a purpose-to-data-source matrix with lawful basis at intake. Make it the approved baseline for runtime enforcement.

source: NIST AI RMF MAP 1.1 / MANAGE 2.2 (context and intended purpose); NIST SP 800-53 AC-4 / AC-3 (purpose-based access enforcement)
Lifecycle stages1 – Use Case Context & Design5 – Usage, Monitoring & Change
Inference-time PII redaction and third-party LLM data-processing controls

Sign zero-retention/no-training terms with each model provider and obtain DPO sign-off on the data flow before enabling any endpoint.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-8 / AC-4 (information flow enforcement)
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
Input/output filtering

Implement output filters to detect and suppress quasi-identifying attribute combinations in model responses.

Query-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)

Propagate source ACLs and classification labels onto every chunk at ingestion. Reject documents whose entitlements cannot be resolved.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 AC-3 / AC-4 Information Flow Enforcement; OWASP Agentic AI Threats & Mitigations (privilege compromise)
Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Output-side DLP inspection with named-entity and PII redaction on the response path

Scan every model response inline with DLP before delivery; redact or block PII, PAN and MNPI matches. Keep the rule set version-controlled.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-7(10) Prevent Exfiltration, SI-4
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Vet allowlisted egress destinations for server-side-fetch (SSRF) primitives; exclude or proxy-inspect any allowlisted service that can fetch arbitrary attacker-controlled URLs✚ proposed

An egress allowlist only contains exfiltration if no allowlisted destination can be coerced into fetching an attacker-controlled URL. Audit each allowlisted domain/endpoint for image-search / link-preview / URL-fetch features (SSRF proxies), and either remove them, pin them to fixed paths, or route them through an inspecting forward proxy. Pair with finishing output sanitization before render so no auto-fetch fires un-inspected.

source: Case study: searchleak-copilot (Varonis Threat Labs, CVE-2026-42824; reported by Microsoft as critical, mitigated server-side ~Jun 2026)
Lifecycle stage4 – Deployment & Serving
Per-user retrieval ACLsinteractive

Making sure the library only returns documents this particular user is allowed to see.

Serving-stack & provisioning attestation, cache isolationinteractive

Making sure the machinery running the model β€” and the template used to stamp out new agents β€” is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Confidence scoring

Implement confidence scoring to communicate output certainty alongside each result. Calibrate before deployment.

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Accuracy acceptance criteria before validation

Define model accuracy acceptance criteria aligned to business requirements before validation commences.

Lifecycle stage3 – Onboarding, Build & Review
AddressesHallucination
Counterfactual explanations

Implement counterfactual explanation to show users what changes would alter the model's output.

Lifecycle stage3 – Onboarding, Build & Review
AddressesHallucination
In-product disclosure of accuracy and limitations

Communicate model accuracy, known limitations, and uncertainty to users in the production interface at launch.

Lifecycle stage4 – Deployment
AddressesHallucination
Continuous production accuracy monitoring against baseline

Monitor production accuracy continuously against the validated baseline. Trigger model review when accuracy degrades.

Lifecycle stage5 – Usage, Monitoring & Change
AddressesHallucination
RAG

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review
AddressesHallucination
Small model selection

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

Lifecycle stage1 – Use Case Context & Design
AddressesHallucination
System prompt design

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

Lifecycle stage3 – Onboarding, Build & Review
AddressesHallucination
Fine-tuning

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

Lifecycle stage3 – Onboarding, Build & Review
Programmable conversation controls

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

Lifecycle stage4 – Deployment
Hallucination rate thresholds and grounding policy

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

Lifecycle stage1 – Use Case Context & Design
AddressesHallucination
Human-in-the-loop validation

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Uncertainty-quantified abstention via self-consistency / semantic entropy

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
AddressesHallucination
Tool-grounded facts for agents (no free-text fabrication of structured data)

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

source: OWASP Agentic AI Threats & Mitigations (cascading hallucination / tool-grounding); OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10
Lifecycle stages3 – Onboarding, Build & Review4 – Deployment
AddressesHallucination
Citation/attribution verification against retrieved sources

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST SP 800-53 SI-10 Information Input Validation
Lifecycle stage4 – Deployment
AddressesHallucination
Uncertainty signalling & abstentioninteractive

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Decoding controls (temperature, constrained output)interactive

Turning down randomness and forcing answers into a strict format so the model improvises less.

Third-party accountability requirements in RFP and contracts

Define third-party AI accountability requirements before vendor engagement. Embed in RFP and contract specifications.

Lifecycle stage1 – Use Case Context & Design
Vendor AI governance due diligence at selection

Conduct AI governance due diligence on third-party providers at selection stage. Reject providers failing minimum maturity.

Lifecycle stage1 – Use Case Context & Design
Required vendor model cards and validation reports

Require third-party providers to submit model cards, validation reports, and security documentation before integration.

Lifecycle stage3 – Onboarding, Build & Review
Ongoing vendor incident notification and reporting obligations

Enforce ongoing third-party accountability obligations including incident notification and periodic performance reporting.

Lifecycle stage5 – Usage, Monitoring & Change
Independent third-party performance and compliance monitoring

Conduct independent performance and compliance monitoring of third-party AI components. Escalate when SLA or compliance obligations are missed.

Lifecycle stage5 – Usage, Monitoring & Change
Continuous third-party assurance with shared-responsibility matrix and obligation flow-down

Allocate every control in a shared-responsibility matrix and flow down regulatory obligations in contract at onboarding. Gate approval on initial assurance artefacts.

source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligations
Lifecycle stage3 – Onboarding, Build & Review
Patch-currency, network isolation & attested version inventory for AI inference-serving runtimes✚ proposed

Treat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.

source: Case study: nvidia-triton-rce-chain (Wiz Research, CVE-2025-23319/-23320/-23334)
Lifecycle stage4 – Deployment & Serving
Keep provider credentials out of third-party plugin/tool custody: broker short-lived, per-tool, revocable tokens (OAuth) instead of long-lived pasted API keys, and require explicit user consent before any secret leaves the host✚ proposed

Third-party developer tools (IDE plugins, MCP servers) must not store or transmit long-lived provider API keys. Issue short-lived, scoped, revocable tokens via a broker/OAuth flow, and gate any first-time outbound transmission of secret-shaped data behind an explicit consent prompt β€” so a trojanized tool has no long-lived credential to exfiltrate and any attempt is visible.

source: Case study: jetbrains-marketplace-ai-keystealer-plugins
Lifecycle stage3 – Development & Tooling
Third-party AI-integration credential containment: minimise & bind OAuth grants, prefer short-lived tokens, monitor per-integration data egress, and keep a tested mass-revocation kill-switch✚ proposed

Treat each third-party AI integration as a privileged non-human principal: issue least-scope, IP/device-bound, short-lived grants (avoid 'full' scope and standing long-lived refresh tokens), instrument the integration's data egress for volume/object-breadth/destination anomalies, and maintain a tested one-move revocation path for all of an integration's tokens so a single vendor-side compromise cannot fan out into hundreds of standing footholds.

source: Proposed from case salesloft-drift-oauth-supply-chain (UNC6395). Grounded in GTIG remediation guidance β€” restrict Connected App scopes (no 'full'), enforce IP restrictions, treat all Drift-connected tokens as compromised: https://cloud.google.com/blog/topics/threat-intelligence/data-theft-salesforce-instances-via-salesloft-drift
Lifecycle stage5 – Usage, Monitoring & Change
Broker LLM/cloud secrets out of the gateway process: short-lived scoped tokens + per-provider spend/egress monitoring✚ proposed

Do not store long-lived multi-provider LLM keys (or ambient cloud/K8s credentials) in the gateway/proxy's plaintext process environment. Issue short-lived, scoped tokens from a secret broker at request time, isolate the serving stack from host cloud/cluster credentials, and monitor per-provider spend and egress so a stolen key surfaces as anomalous usage β€” capping the loot a compromised gateway dependency can harvest.

source: Case study: teampcp-litellm-pypi-gateway-compromise
Lifecycle stage4 – Deployment & Serving
MCP/plugin pinning, manifest hashing & re-reviewinteractive

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Declared data sources and provenance at intake

Declare all planned training and test data sources at use case intake, with provenance status for each.

Lifecycle stage1 – Use Case Context & Design
Post hoc interpretability techniques

Plan the interpretability approach at design stage to ensure source provenance can be traced and disclosed to users.

Lifecycle stage1 – Use Case Context & Design
Documented data provenance during collection

Document actual provenance for each data source during collection: origins, methods, timestamps, custodian identity.

Lifecycle stage2 – Data Acquisition & Processing
Detective Β· 18
Vulnerability assessment

Conduct a data poisoning threat assessment at design stage. Identify likely attack vectors and assign risk ratings.

Lifecycle stages1 – Use Case Context & Design4 – Deployment5 – Usage, Monitoring & Change
Red teaming

Simulate data poisoning attacks (backdoor, label flipping, gradient-based) to assess model resilience before deployment.

Cryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)

Verify a signed attestation and content hash on every dataset shard at ingestion. Reject unsigned or hash-mismatched data before it reaches the training pipeline.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SI-7 Software, Firmware, and Information Integrity, SR-4 Provenance
Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review
Pre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testing

Gate every model promotion on backdoor-trigger probes and a behavioral diff against the approved baseline. Block release on significant regressions or trigger-pattern anomalies.

source: MITRE ATLAS AML.M0014 (Verify ML Artifacts), AML.M0019 (Red Teaming); NIST AI RMF MANAGE 2.2 and MEASURE 2.7
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Provenance & content signinginteractive

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Full-trace audit logginginteractive

Recording everything β€” questions, documents fetched, actions taken β€” so you can investigate when something goes wrong.

Real-time monitoring of anomalous data transfers

Monitor production for anomalous data transfers in real time. Alert on any transfer outside approved data flow boundaries.

Lifecycle stage5 – Usage, Monitoring & Change
Automated DSAR and right-to-erasure propagation across AI artefacts

Tag personal data with subject identifiers at ingestion and maintain an artefact inventory map of every store it reaches. Keep lineage current so erasure can propagate.

source: NIST AI RMF MANAGE 4.1 (post-deployment response); NIST SP 800-53 SI-12 Information Management and Retention, PT-2/PT-3 (personal data processing)
Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change
Canary-token and membership-inference red-team probes against training/fine-tuning data memorisation

Seed registered canary records into the fine-tuning corpus during data preparation. Control the seed manifest so canaries stay traceable and tamper-proof.

source: MITRE ATLAS AML.T0024 (Exfiltration via ML Inference API), AML.T0024.000 (Infer Training Data Membership); NIST AI RMF MEASURE 2.7
Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review
Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Robustness testing

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Synthetic evaluation datasets

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

Lifecycle stage3 – Onboarding, Build & Review
Runtime faithfulness/groundedness scoring with abstain gate

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)
Lifecycle stage3 – Onboarding, Build & Review
AddressesHallucination
Grounding / citation checksinteractive

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Golden-set regression canary to detect undisclosed vendor-side model changes

Build and baseline the golden-set suite against the vendor model before go-live. Sign off thresholds with the model risk owner as a release condition.

source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (monitoring changed model components); MITRE ATLAS AML.M0015 (Adversarial Input Detection / validation); NIST AI RMF MEASURE 2.6 / MANAGE 4.1
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
AIBOM-driven cryptographic verification of third-party model artifacts

Re-verify hashes and signatures on every vendor model update before promotion. Reconcile deployed artifacts against the AIBOM on a set cadence.

source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)
Lifecycle stage5 – Usage, Monitoring & Change
Corrective Β· 18
Penetration testing

Penetration test the training data pipeline to identify injection points and access control weaknesses.

Statistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)

Scan every ingestion batch with spectral-signature and clustering detectors before training. Quarantine flagged clusters for human review against documented thresholds.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data); OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning; NIST AI RMF MEASURE 2.7
Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change
Runtime memory-poisoning drift detection and per-session memory quarantine/rollback✚ proposed

Continuously correlate live agent-memory writes against output behaviour to flag drift, then quarantine and roll back the suspected-poisoned memory record across all affected sessions.

source: Interactive-control reconciliation: ctrl-memory-quarantine (partial coverage)
Lifecycle stage5 – Usage, Monitoring & Change
Production privacy incident monitoring and regulator notification

Monitor for privacy incidents in production including personal data appearing in outputs. Notify regulators within required timeframes.

Lifecycle stage5 – Usage, Monitoring & Change
Privacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)

Tag every memory and vector record with subject-id and retention class; partition stores per tenant/user. Prove the erasure and isolation paths in testing before release.

source: OWASP Agentic AI Threats & Mitigations (memory/knowledge-base privacy); NIST SP 800-53 SI-12 Information Management and Retention
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Red teaming

Test de-identification approach against known re-identification attacks (quasi-identifier linkage, singling-out). Remediate if risk is high.

Vulnerability assessment

Conduct periodic data leakage audits including training data memorisation testing. Escalate confirmed leakage incidents to PDPA notification process.

Forensic evidence preservation and incident logging

Implement tamper-evident capture of prompts, outputs, and version state during build. Verify a full incident timeline can be reconstructed before go-live.

source: NIST SP 800-86 Guide to Integrating Forensic Techniques into Incident Response; ISO/IEC 27037 evidence handling; NIST SP 800-61r2 (Detection & Analysis – evidence handling)
Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Egress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agents

Run agent tool calls in a network-restricted sandbox behind a deny-by-default egress allow-list. Require security approval for any destination added.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; OWASP Agentic AI Threats & Mitigations (tool-misuse / exfiltration); NIST SP 800-53 SC-7 Boundary Protection / AC-4
Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change
Reinforcement learning

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

Lifecycle stage5 – Usage, Monitoring & Change
User-facing disclosure of hallucination risk

Require user-facing interfaces to disclose Gen AI limitations and hallucination risk before go-live.

Lifecycle stage4 – Deployment
AddressesHallucination
Runtime faithfulness/groundedness scoring with abstain gate

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

source: OWASP Top 10 for LLM Apps LLM09:2025 Misinformation; NIST AI RMF MEASURE 2.7 / 2.9 (validity, reliability, robustness)
Lifecycle stage4 – Deployment
AddressesHallucination
Uncertainty-quantified abstention via self-consistency / semantic entropy

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

source: Farquhar et al. 'Detecting hallucinations using semantic entropy' (Nature 2024); NIST AI RMF MEASURE 2.6 (reliability under uncertainty)
Lifecycle stage4 – Deployment
AddressesHallucination
User AI-literacy & verification workflowsinteractive

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Model-agnostic gateway with version pinning, multi-vendor fallback and exit plan

Design all vendor model access behind a gateway with pinned versions, a second-vendor fallback, and a documented exit plan. Gate architecture sign-off on no single-sourcing.

source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (maintain supported model versions); NIST AI RMF GOVERN 6.1 (third-party resilience, contingency); established AI-gateway fallback practice
Lifecycle stages1 – Use Case Context & Design5 – Usage, Monitoring & Change
AIBOM-driven cryptographic verification of third-party model artifacts

Verify every third-party model artifact against its AIBOM hashes and signatures before load. Fail the build on any unverified artifact.

source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)
Lifecycle stage3 – Onboarding, Build & Review
Continuous third-party assurance with shared-responsibility matrix and obligation flow-down

Review independent vendor assurance on cadence, log gaps, and track remediation. Keep the shared-responsibility matrix current so every control has an owner.

source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligations
Lifecycle stage5 – Usage, Monitoring & Change
Open the Control Library β†’

See it go wrong β€” related scenarios

πŸŒ€The Refund That Never Existed

A support chatbot invents a policy β€” and the company is held to it

☠️Poisoning the Well

An attacker edits the wiki; the assistant cites the lie back to everyone

πŸ“§The Email That Gave Orders

A support email hides instructions β€” and the assistant obeys them

πŸ•΅οΈLies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

πŸ‘‚Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🧲Poison the Vector, Not the Words

An attacker crafts a gibberish passage whose embedding sits near thousands of questions β€” so it's retrieved everywhere

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

πŸͺŸStealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

πŸͺ€The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

πŸšͺThe Classifier That Waves It Through

The safety guard is itself a trained model β€” and someone poisoned its lessons

πŸ“ΌThe Compromised Flight Recorder

The forensic record is itself the attack surface β€” an agent's log is poisoned, then quietly rewritten

πŸ‘οΈThe Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

πŸ”“The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

πŸ–ΌοΈThe Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

πŸ’€The Sleeper

A capable third-party model that behaves perfectly β€” until it sees the trigger

🎫The Stolen Session

An attacker captures the agent's bearer token β€” and inherits its authority

πŸ”ŒThe Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

πŸ₯ΈThe Uninvited Agent

A forged peer registers on the agent directory β€” and the planner enlists it

πŸ›‘οΈThe Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

πŸͺͺThe Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent β€” and the planner acts on its behalf

πŸ–ΌοΈZero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—