#30

Insufficient model accuracy / soundness

Risk taxonomy

Definition

The model outputs are inaccurate or do not meet the performance thresholds required to ensure fitness for purpose.

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 9

Fine-tuning

Fine-tune on domain-specific, high-quality data to improve model performance on target tasks. Validate accuracy post fine-tuning.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesHallucination Model Drift & Silent Degradation

Weight regularisation and normalisation

Apply regularisation (L1/L2, dropout, early stopping) to prevent overfitting and improve generalisation.

Lifecycle stage3 – Onboarding, Build & Review

Small model selection

Prefer smaller, purpose-built models where accuracy requirements are met, to reduce complexity and maintenance burden.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesHallucination

AI onboarding using domain data

Verify training data covers all material input segments for the target use case. Augment where coverage gaps are found.

Lifecycle stage3 – Onboarding, Build & Review

Model calibration

Calibrate model outputs to align stated confidence with actual accuracy. Validate calibration on held-out data.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesOverreliance / Automation Bias

Quantitative accuracy thresholds calibrated to impact

Define quantitative accuracy acceptance thresholds at design stage calibrated to business impact and regulatory requirements.

Lifecycle stage1 – Use Case Context & Design

Input/output filtering

Configure output confidence thresholds at deployment to suppress or escalate low-confidence outputs to human review.

Lifecycle stage4 – Deployment

Also addressesBias Amplification & Sycophancy Overreliance / Automation Bias Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Human-in-the-loop validation

Route high-consequence or low-confidence outputs to human review in production. Track override rates and outcomes.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

User disclosure of accuracy and confidence limits

Disclose known accuracy limitations and confidence levels to users at deployment. Update disclosures when model changes.

Lifecycle stage4 – Deployment

Detective · 2

Robustness testing

Define accuracy acceptance criteria before validation. Conduct multi-metric validation against hold-out sets. Block deployment if criteria are not met.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Also addressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Synthetic evaluation datasets

Construct synthetic edge-case evaluation datasets to stress-test model boundaries and identify accuracy failure modes.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Corrective · 1

Reinforcement learning

Establish a periodic revalidation and improvement cycle using RLHF or user feedback. Retrain when accuracy trends below threshold.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Open these in the Control Library →

Other risks in Robustness & Stability

#24 Hallucination / Fabrication / Confabulation #25 Overconfidence #26 Training data or inputs not fit for purpose #27 Lack of continuous monitoring #28 Insufficient data quality #29 Model staleness #31 Model degradation from unexpected use #32 Inadequate operational resilience #33 Unmet architectural requirements #34 Lack of reproducibility #44 Disruption to connected systems