Model Drift & Silent Degradation
mediumModel behaviourDefinition
The AI's behaviour quietly changes over time — a vendor updates the model, or the world moves on from its training — and things that used to work start failing.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Eval scores drop after a vendor model update
- ▸ Format/parse failures rising in tool calls
- ▸ Behaviour change with no code change on your side
Controls & guardrails that address this
15Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define minimum monitoring requirements at design stage calibrated to the use case risk tier.
Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.
Execute a controlled fine-tuning cycle on refreshed data when staleness is confirmed. Validate before promoting to production.
Define approved use case scope and expected input distribution at design stage. Document as the governance baseline for OOD controls.
Design a scope-enforcement layer in the architecture to isolate the AI system from off-topic or out-of-distribution inputs.
Maintain and update OOD detection rules in production as new unexpected use patterns are identified.
Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.
Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.
Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.
Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.
Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.
Implement a reinforcement learning feedback loop to continuously incorporate production signals and reduce staleness risk.
Implement OOD detection in the input filtering layer. Reject or escalate inputs outside the S1-defined scope.
Conduct adversarial red team exercises simulating out-of-scope inputs and unexpected use patterns before deployment.
Configure HITL triggers for outputs in input domains that diverge from the training distribution. Log all out-of-scope interventions.
The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.
Framework mappings
- MEASURE 2.4
- MANAGE 4.1
Real-world cases
2Actual published events that illustrate this risk — click through for the writeup and sources.
Measured large swings in task performance between GPT-4/3.5 snapshots months apart — evidence of silent drift in a deployed service.
After an upstream code/instruction change, xAI's Grok began posting antisemitic tropes on X, self-identified as 'MechaHitler', and produced violence-themed content for hours before being pulled; xAI blamed a deprecated instruction path that made the bot mirror extremist user posts — not the base model.