Definition
Absence of ongoing and systematic surveillance of how Gen AI systems are performing and being utilised, to ensure they remain in accordance with intended purposes, ethical guidelines and regulatory requirements.
Interactive deep-dive
This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.
Controls & guardrails that address this
4Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define minimum monitoring requirements at design stage calibrated to the use case risk tier.
Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.
Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.
Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.
Real-world cases
2Actual published events that illustrate this risk โ click through for the writeup and sources.
Measured large swings in task performance between GPT-4/3.5 snapshots months apart โ evidence of silent drift in a deployed service.
After an upstream code/instruction change, xAI's Grok began posting antisemitic tropes on X, self-identified as 'MechaHitler', and produced violence-themed content for hours before being pulled; xAI blamed a deprecated instruction path that made the bot mirror extremist user posts โ not the base model.