๐Ÿ”AI RiskAtlas
โ† Risk Taxonomy
#13

Unclear provenance for training/test data

Risk taxonomy

Definition

The data used to train and test the model cannot be convincingly and comprehensively traced, presenting challenges for audit, disclosure, and compliance, as well as posing the risk of the FI not having the right to use the data.

Interactive deep-dive

This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.

Controls & guardrails that address this

4

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive ยท 4
Declared data sources and provenance at intake

Declare all planned training and test data sources at use case intake, with provenance status for each.

Lifecycle stage1 โ€“ Use Case Context & Design
Post hoc interpretability techniques

Plan the interpretability approach at design stage to ensure source provenance can be traced and disclosed to users.

Lifecycle stage1 โ€“ Use Case Context & Design
Documented data provenance during collection

Document actual provenance for each data source during collection: origins, methods, timestamps, custodian identity.

Lifecycle stage2 โ€“ Data Acquisition & Processing
Confidence scoring

Apply data quality scoring to all acquired data to document provenance reliability. Flag low-confidence sources for review.

Lifecycle stage2 โ€“ Data Acquisition & Processing
Also addressesHallucination
Open these in the Control Library โ†’

Other risks in Transparency

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—