Definition
The data used to train and test the model cannot be convincingly and comprehensively traced, presenting challenges for audit, disclosure, and compliance, as well as posing the risk of the FI not having the right to use the data.
Interactive deep-dive
This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.
Controls & guardrails that address this
4Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Declare all planned training and test data sources at use case intake, with provenance status for each.
Plan the interpretability approach at design stage to ensure source provenance can be traced and disclosed to users.
Document actual provenance for each data source during collection: origins, methods, timestamps, custodian identity.
Apply data quality scoring to all acquired data to document provenance reliability. Flag low-confidence sources for review.