Definition
Training data used in the model is not representative of the geographical and cultural context where the model will be used, or not aligned to the system's intended goal, leading to incorrect outputs.
Controls & guardrails that address this
5Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define training data fitness requirements at design stage including domain coverage, recency, and format specifications.
Plan the domain data strategy at design stage: identify sources that best cover the target operational distribution.
Screen acquired training data through automated fitness checks (domain relevance, recency, format conformity). Reject non-conforming data.
Construct synthetic evaluation datasets targeting operational edge cases identified in S2 gap analysis. Use as regression baseline.
Monitor production input distributions for drift from training data distribution. Trigger re-training when covariate shift is confirmed.