Definition
Low-quality or noisy data used for training could result in poor model performance. Extensive use of synthetic data could under-expose datasets to real-world complexity, reducing performance on real data.
Controls & guardrails that address this
4Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Establish data quality standards for AI training data at design stage: completeness, accuracy, and timeliness thresholds.
Plan the data curation strategy at design stage to ensure domain-appropriate quality at the required scale.
Assess acquired training data quality against S1-defined standards before training commences. Reject batches failing quality gates.
Implement automated data quality checks in the ingestion pipeline (schema validation, duplicate detection, completeness scoring). Reject non-conforming batches.