Definition
Operational resilience or service continuity plans increase in complexity due to the broad set of services and capabilities of Gen AI.
Controls & guardrails that address this
7Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define operational resilience requirements (RTO, RPO, availability SLA) for the AI system at design stage.
Design a modular AI architecture with independent failover, rollback, and degraded-mode capability.
Include the AI system in BCP and DRP. Define recovery procedures for AI components and test at least annually.
Conduct load, failover, and chaos testing before production deployment. Block go-live if RTO/RPO criteria are not met.
Define AI incident categories, severity tiers, and triage flow before go-live. Gate launch on governance approval of the plan and named roles.
source: NIST SP 800-61r2 Computer Security Incident Handling Guide (Preparation; Detection & Analysis โ incident categorisation/prioritisation); NIST AI RMF MANAGE 4.1Set the AI service's criticality tier, RTO/RPO, and degraded-mode service level at design with business sign-off. Register it in enterprise BCP scope.
source: ISO 22301 Business Continuity Management; ISO/IEC 27031; NIST SP 800-34r1 (Activation & Notification, Recovery)Wire detections into the IR queue and verify paging with a test escalation before go-live. Gate release on a successful dry-run.
source: ISO/IEC 27035-1:2023 Information security incident management (incident response coordination); NIST SP 800-61r2 (Coordination & Information Sharing)