🔍AI RiskAtlas
← Risk Taxonomy
#28

Insufficient data quality

Risk taxonomy

Definition

Low-quality or noisy data used for training could result in poor model performance. Extensive use of synthetic data could under-expose datasets to real-world complexity, reducing performance on real data.

Controls & guardrails that address this

4

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 2
Training data quality standards and thresholds

Establish data quality standards for AI training data at design stage: completeness, accuracy, and timeliness thresholds.

Lifecycle stage1 – Use Case Context & Design
AI onboarding using domain data

Plan the data curation strategy at design stage to ensure domain-appropriate quality at the required scale.

Lifecycle stage1 – Use Case Context & Design
Detective · 1
Robustness testing

Assess acquired training data quality against S1-defined standards before training commences. Reject batches failing quality gates.

Lifecycle stage2 – Data Acquisition & Processing
Corrective · 1
Input filtering

Implement automated data quality checks in the ingestion pipeline (schema validation, duplicate detection, completeness scoring). Reject non-conforming batches.

Lifecycle stage2 – Data Acquisition & Processing
Open these in the Control Library →

Other risks in Robustness & Stability

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗