Model Drift & Silent Degradation

mediumModel behaviour

Definition

The AI's behaviour quietly changes over time — a vendor updates the model, or the world moves on from its training — and things that used to work start failing.

Where it attaches

The system components this risk arises at.

🧠 LLM🧬 Model Weights & Registry🏗️ Serving Infrastructure✂️ Tokenizer📈 Monitoring & Evals📉 Quantizer / Compressor

Detection signals

▸ Eval scores drop after a vendor model update
▸ Format/parse failures rising in tool calls
▸ Behaviour change with no code change on your side

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 7

Risk-tiered minimum monitoring requirements at design

Define minimum monitoring requirements at design stage calibrated to the use case risk tier.

Lifecycle stage1 – Use Case Context & Design

Programmable conversation controls

Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Also addressesHallucination

Fine-tuning

Execute a controlled fine-tuning cycle on refreshed data when staleness is confirmed. Validate before promoting to production.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesHallucination

Approved use scope baseline for OOD controls

Define approved use case scope and expected input distribution at design stage. Document as the governance baseline for OOD controls.

Lifecycle stage1 – Use Case Context & Design

Modular architecture

Design a scope-enforcement layer in the architecture to isolate the AI system from off-topic or out-of-distribution inputs.

Lifecycle stage1 – Use Case Context & Design

Input filtering

Maintain and update OOD detection rules in production as new unexpected use patterns are identified.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesKnowledge / Training Data Poisoning Sensitive Data Leakage

Weight provenance, hashing & pre-deploy evalsinteractive

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Also addressesKnowledge / Training Data Poisoning Supply-Chain Compromise Abliteration / Safety Removal Model Backdoors / Sleeper Agents Training-Data Rights & Provenance

Detective · 4

Synthetic evaluation datasets

Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesHallucination Overreliance / Automation Bias

Robustness testing

Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment5 – Usage, Monitoring & Change

Also addressesHallucination Overreliance / Automation Bias

Behavioural evals & regression gatinginteractive

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Also addressesJailbreak Hallucination Supply-Chain Compromise Distributed / Cross-Agent Jailbreak Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Harmful / Non-Consensual Media Generation Training-Data Rights & Provenance

Runtime monitoring & anomaly detectioninteractive

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Corrective · 5

Reinforcement learning

Implement a reinforcement learning feedback loop to continuously incorporate production signals and reduce staleness risk.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesHallucination Overreliance / Automation Bias

Input filtering

Implement OOD detection in the input filtering layer. Reject or escalate inputs outside the S1-defined scope.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesKnowledge / Training Data Poisoning Sensitive Data Leakage

Red teaming

Conduct adversarial red team exercises simulating out-of-scope inputs and unexpected use patterns before deployment.

Lifecycle stage3 – Onboarding, Build & Review

Also addressesJailbreak Knowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Human-in-the-loop validation

Configure HITL triggers for outputs in input domains that diverge from the training distribution. Log all out-of-scope interventions.

Lifecycle stage5 – Usage, Monitoring & Change

Also addressesHallucination Overreliance / Automation Bias

Governance: risk assessment, red-teaming & incident responseinteractive

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Also addressesOverreliance / Automation Bias Oversight & Audit-Trail Tampering Supply-Chain Compromise Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Capability / Architecture Disclosure Parasocial Attachment & Emotional Over-reliance Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)Harmful / Non-Consensual Media Generation Watermark & Provenance Evasion Training-Data Rights & Provenance

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

—

MITRE ATLAS

—

NIST AI RMF

MEASURE 2.4
MANAGE 4.1

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

'How Is ChatGPT's Behavior Changing over Time?' (Chen, Zaharia, Zou)2023

Measured large swings in task performance between GPT-4/3.5 snapshots months apart — evidence of silent drift in a deployed service.

Grok 'MechaHitler' — config update degrades a deployed chatbot into antisemitic, violent output2025

After an upstream code/instruction change, xAI's Grok began posting antisemitic tropes on X, self-identified as 'MechaHitler', and produced violence-themed content for hours before being pulled; xAI blamed a deprecated instruction path that made the bot mirror extremist user posts — not the base model.

Browse all real-world cases →

Model Drift & Silent Degradation

Definition

Where it attaches

Detection signals

Controls & guardrails that address this

Framework mappings

Real-world cases

Related risks