๐Ÿ”AI RiskAtlas
โ† Risk Taxonomy
#26

Training data or inputs not fit for purpose

Risk taxonomy

Definition

Training data used in the model is not representative of the geographical and cultural context where the model will be used, or not aligned to the system's intended goal, leading to incorrect outputs.

Controls & guardrails that address this

5

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive ยท 3
Training data fitness requirements at design

Define training data fitness requirements at design stage including domain coverage, recency, and format specifications.

Lifecycle stage1 โ€“ Use Case Context & Design
AI onboarding using domain data

Plan the domain data strategy at design stage: identify sources that best cover the target operational distribution.

Lifecycle stages1 โ€“ Use Case Context & Design2 โ€“ Data Acquisition & Processing
Input filtering

Screen acquired training data through automated fitness checks (domain relevance, recency, format conformity). Reject non-conforming data.

Lifecycle stage2 โ€“ Data Acquisition & Processing
Detective ยท 2
Synthetic evaluation datasets

Construct synthetic evaluation datasets targeting operational edge cases identified in S2 gap analysis. Use as regression baseline.

Lifecycle stage3 โ€“ Onboarding, Build & Review
Robustness testing

Monitor production input distributions for drift from training data distribution. Trigger re-training when covariate shift is confirmed.

Lifecycle stage5 โ€“ Usage, Monitoring & Change
Open these in the Control Library โ†’

Other risks in Robustness & Stability

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—