🔍AI RiskAtlas
Control Library

Guardrails & controls — by category, lifecycle, layer or risk

Each row is a specific guardrail addressing a specific risk, tagged with its control category, AI lifecycle stage, and control layer. Switch how it's organised, and filter to your own library or the researched additions. Sources: Control Library v9 / Control Category v2 (MindForge Appendix G guardrails; ABS-aligned categories), with researched gap-fills.

588
Guardrails total
266 unique
421
Your library (v9)
215 unique
22
Proposed additions
22 unique
145
Interactive (lab)
29 unique
22/22
Risks covered
Standard lens

Three provenances are merged here: your library (v9), proposed additions, and the interactive (lab) controls used in scenarios. Every row carries a function (P/D/C) derived from its Control Category. Note: the categories are model-risk-centric and the MindForge Appendix G guardrails were force-fitted — treat category fit as indicative.

View / lens
Show
Category
588 guardrails · 18 categories

Search narrows live. To lock a result in, use “Filter to this →” on a category header — it becomes a shareable, persistent filter (with a clear button above).

PreventiveFairness impact assessment at use-case intakeLibrary v9

Conduct fairness impact assessment at use case intake. Require governance sign-off on demographic coverage requirements before data acquisition.

Risk: Unrepresentative or biased data inputs
PreventiveAffected group register at intakeLibrary v9

Identify all groups at risk of adverse impact at use case intake. Register them in the affected group register.

PreventiveEthical design assessment in onboardingLibrary v9

Conduct ethical design assessment at use case intake before build begins. Require sign-off by ethics or risk committee.

PreventiveProhibited outputs and ethical boundaries in design docLibrary v9

Define prohibited outputs and ethical boundary constraints in the use case design document before build.

PreventiveCompute carbon footprint assessment at intakeLibrary v9

Include compute carbon footprint assessment in use case intake. Set energy efficiency thresholds as intake criterion.

Risk: Environmental sustainability impact
PreventiveEthical design assessment in onboardingLibrary v9

Conduct ethical design review at intake specifically examining interface design for dark patterns.

PreventiveProhibited dark pattern taxonomy as design constraintLibrary v9

Publish a prohibited dark pattern taxonomy and embed it as a design constraint before build.

PreventiveContent safety policy with zero-tolerance thresholdsLibrary v9

Define content safety policy at use case design stage. Classify prohibited content types and set zero-tolerance thresholds.

Risk: Jailbreak
PreventiveMandatory AI risk training for use-case sponsorsLibrary v9

Mandate AI risk awareness training for all use case sponsors and design team members before project kick-off.

PreventiveTraining completion gate for build personnelLibrary v9

Mandate AI risk training for all build and test personnel. Gate project participation on training completion.

PreventiveGovernance training for data acquisition personnelLibrary v9

Require AI governance training for all personnel involved in data acquisition and processing before project participation.

PreventivePre-launch training verification for customer-facing teamsLibrary v9

Verify all deployment, operations, and customer-facing team members have completed AI risk training before launch.

PreventiveThird-party accountability requirements in RFP and contractsLibrary v9

Define third-party AI accountability requirements before vendor engagement. Embed in RFP and contract specifications.

PreventiveVendor AI governance due diligence at selectionLibrary v9

Conduct AI governance due diligence on third-party providers at selection stage. Reject providers failing minimum maturity.

PreventiveRequired vendor model cards and validation reportsLibrary v9

Require third-party providers to submit model cards, validation reports, and security documentation before integration.

PreventiveOngoing vendor incident notification and reporting obligationsLibrary v9

Enforce ongoing third-party accountability obligations including incident notification and periodic performance reporting.

PreventiveIndependent third-party performance and compliance monitoringLibrary v9

Conduct independent performance and compliance monitoring of third-party AI components. Escalate when SLA or compliance obligations are missed.

PreventiveContinuous third-party assurance with shared-responsibility matrix and obligation flow-downLibrary v9

Allocate every control in a shared-responsibility matrix and flow down regulatory obligations in contract at onboarding. Gate approval on initial assurance artefacts.

CorrectiveContinuous third-party assurance with shared-responsibility matrix and obligation flow-downLibrary v9

Review independent vendor assurance on cadence, log gaps, and track remediation. Keep the shared-responsibility matrix current so every control has an owner.

PreventiveMandatory AI initiative registration before designLibrary v9

Register all AI initiatives in the enterprise inventory before design begins. Block unregistered projects from proceeding.

Risk: Lack of use case, data and model governance
PreventiveData stewardship and classification governance from collectionLibrary v9

Enforce data stewardship and classification governance on all AI training data from point of collection.

Risk: Lack of use case, data and model governance
PreventiveGovernance stage-gates at each SDLC phaseLibrary v9

Enforce governance stage-gates at each SDLC phase. Block progression to next stage until all checkpoints are cleared.

Risk: Lack of use case, data and model governance
PreventivePre-deployment stage-gate clearance reviewLibrary v9

Conduct pre-deployment governance review confirming all lifecycle stage-gates are cleared before go-live.

Risk: Lack of use case, data and model governance
PreventiveChange management for model updates and retirementsLibrary v9

Maintain AI inventory in current state. Apply formal change management for all model updates and retirements.

Risk: Lack of use case, data and model governance
PreventiveRisk-tiered human oversight requirements at designLibrary v9

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

PreventivePeriodic oversight effectiveness review and escalationLibrary v9

Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.

PreventiveUser feedback and recourse design with SLAsLibrary v9

Design user feedback and recourse mechanisms at use case design stage with defined SLAs for complaint resolution.

Risk: Inadequate feedback and recourse mechanisms
PreventiveStructured feedback routing within defined SLALibrary v9

Operate a structured feedback management process. Log, categorise, and route all feedback to responsible owners within SLA.

Risk: Inadequate feedback and recourse mechanisms
PreventiveAccuracy acceptance criteria before validationLibrary v9

Define model accuracy acceptance criteria aligned to business requirements before validation commences.

PreventiveContinuous production accuracy monitoring against baselineLibrary v9

Monitor production accuracy continuously against the validated baseline. Trigger model review when accuracy degrades.

PreventiveDeclared data sources and provenance at intakeLibrary v9

Declare all planned training and test data sources at use case intake, with provenance status for each.

PreventiveDocumented data provenance during collectionLibrary v9

Document actual provenance for each data source during collection: origins, methods, timestamps, custodian identity.

PreventiveExplainability requirements aligned to regulatory needsLibrary v9

Define explainability requirements at design stage aligned to regulatory obligations and affected user needs.

Risk: Lack of explainability
PreventiveAI identity disclosure policy at designLibrary v9

Define AI identity disclosure policy at design stage. Specify when and how the system must identify itself as AI.

PreventiveProduction anthropomorphism incident monitoringLibrary v9

Monitor production for anthropomorphism incidents. Escalate complaints where users believed they were interacting with a human.

PreventiveJurisdiction mapping for data processing at intakeLibrary v9

Map all jurisdictions involved in planned data collection, processing, and storage at use case intake.

Risk: Inability to ensure location compliance for model hosting and data processing
PreventiveResidency compliance verification during acquisitionLibrary v9

Verify residency compliance for all data collection, storage, and cross-border transfers during acquisition.

Risk: Inability to ensure location compliance for model hosting and data processing
PreventivePre-launch verification of residency controlsLibrary v9

Confirm all data residency controls are active and verified in the production environment before go-live.

Risk: Inability to ensure location compliance for model hosting and data processing
PreventivePreliminary legal review of data ownershipLibrary v9

Conduct a preliminary legal review of planned training data sources to establish ownership status at design stage.

Risk: Unclear data ownership
PreventiveDefinitive data ownership review and licensingLibrary v9

Conduct a definitive legal review of data ownership for all training datasets before use. Obtain licences where required.

Risk: Unclear data ownership
PreventiveApproved storage location policy from collectionLibrary v9

Establish data transfer and storage policy for AI training data. Enforce approved storage locations from point of collection.

PreventiveApproval-gated data transfers from build environmentLibrary v9

Enforce data handling policy in the build environment. Require explicit approval for any data transfers outside the environment.

PreventiveRegulatory impact assessment mapping obligations at designLibrary v9

Conduct a regulatory impact assessment at design stage. Map planned use case activities to applicable regulatory obligations.

Risk: Breach or misalignment with regulatory or organisational standards
PreventiveEarly legal engagement on pre-approval requirementsLibrary v9

Engage legal and compliance at design stage to identify pre-approval or notification requirements before build begins.

Risk: Breach or misalignment with regulatory or organisational standards
PreventivePre-deployment compliance review of design and dataLibrary v9

Conduct a formal compliance review of model design, data practices, and outputs before deployment approval.

Risk: Breach or misalignment with regulatory or organisational standards
PreventiveRegulatory pre-approvals secured before go-liveLibrary v9

Obtain all required regulatory pre-approvals and file notifications before go-live. Do not launch without confirmation.

Risk: Breach or misalignment with regulatory or organisational standards
PreventiveLegal review of training data regulatory basisLibrary v9

Require legal and compliance review of all training data sources before acquisition to confirm regulatory basis.

Risk: Breach or misalignment with regulatory or organisational standards
PreventivePreliminary IP risk assessment of data sourcesLibrary v9

Conduct a preliminary IP risk assessment for all planned training data sources at design stage.

Risk: IP infringement
PreventiveIP rights verification and licensing at acquisitionLibrary v9

Verify IP rights for all training data at acquisition. Obtain licences or waivers before incorporating protected material.

Risk: IP infringement
PreventiveOutput sampling for near-verbatim training reproductionLibrary v9

Sample model outputs for near-verbatim reproduction of training data during build-stage legal review.

Risk: IP infringement
PreventiveAssessment of claimable IP over AI outputsLibrary v9

Assess what IP protection the organisation can claim over AI-generated outputs at design stage. Document legal position.

Risk: Unavailability of IP protection
PreventiveDocumented output IP ownership in terms of serviceLibrary v9

Document the IP ownership position for AI-generated outputs and incorporate into terms of service before deployment.

Risk: Unavailability of IP protection
PreventivePrivacy risk assessment and DPIA determinationLibrary v9

Conduct a privacy risk assessment at use case design stage. Determine if a DPIA is required before data acquisition.

PreventiveConsent, minimisation, and anonymisation during acquisitionLibrary v9

Apply S1-defined privacy controls during data acquisition: verify consent, minimise data, anonymise personal data.

PreventiveOperational consent management and privacy noticeLibrary v9

Publish the privacy notice and confirm consent management is operational before go-live.

PreventiveData retention schedules defined at designLibrary v9

Define data retention schedules for all AI data categories at design stage, covering training, test, and production data.

Risk: Unclear data retention and deletion
PreventiveRetention tagging with automated deletion at collectionLibrary v9

Tag data with retention periods at collection and automate deletion. Document automated deletion configuration.

Risk: Unclear data retention and deletion
PreventiveAutomated retention and deletion across artefact typesLibrary v9

Implement automated retention and deletion controls for all artefact types (training data, models, logs). Test before deployment.

Risk: Unclear data retention and deletion
PreventiveHallucination rate thresholds and grounding policyLibrary v9

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

PreventiveConsequence-of-error severity classification at designLibrary v9

Classify the use case by consequence-of-error severity at design stage. Define overconfidence risk tolerance accordingly.

PreventiveTraining data fitness requirements at designLibrary v9

Define training data fitness requirements at design stage including domain coverage, recency, and format specifications.

Risk: Training data or inputs not fit for purpose
PreventiveRisk-tiered minimum monitoring requirements at designLibrary v9

Define minimum monitoring requirements at design stage calibrated to the use case risk tier.

PreventiveTraining data quality standards and thresholdsLibrary v9

Establish data quality standards for AI training data at design stage: completeness, accuracy, and timeliness thresholds.

Risk: Insufficient data quality
PreventiveQuantitative accuracy thresholds calibrated to impactLibrary v9

Define quantitative accuracy acceptance thresholds at design stage calibrated to business impact and regulatory requirements.

Risk: Insufficient model accuracy / soundness
PreventiveApproved use scope baseline for OOD controlsLibrary v9

Define approved use case scope and expected input distribution at design stage. Document as the governance baseline for OOD controls.

CorrectiveOperational resilience targets defined at designLibrary v9

Define operational resilience requirements (RTO, RPO, availability SLA) for the AI system at design stage.

Risk: Inadequate operational resilience
PreventiveNon-functional performance requirements at designLibrary v9

Define non-functional requirements (latency, throughput, scalability) for the AI system at design stage.

Risk: Unmet architectural requirements
PreventiveModel versioning and experiment tracking gateLibrary v9

Implement model versioning and experiment tracking as a governance requirement during build. Gate model promotion on version registry entry.

Risk: Lack of reproducibility
PreventiveDesign-time authority model and approval gate defining each agent's identity, scopes, and delegation envelopeLibrary v9

Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.

PreventiveCentral agent registry / non-human identity inventory with ownership and lifecycle metadataLibrary v9

Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.

PreventiveDesign-time authority model and approval gate defining each agent's identity, scopes, and delegation envelopeLibrary v9

Verify enforced scopes and policy rules trace one-for-one to the approved authority matrix. Treat divergence as a blocking defect before onboarding completes.

CorrectiveCentral agent registry / non-human identity inventory with ownership and lifecycle metadataLibrary v9

Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.

PreventiveEnd-user AI-literacy training and verification-skill program✚ Proposed — not in your library

Provide recurring AI-literacy training to end users and decision-makers so they can recognise model failure modes and competently apply verification workflows, with periodic refreshers to counter automation bias and training decay.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

PreventiveInter-agent authentication & admission controlInteractive (lab)

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

Limitation: Identity proves who an agent is, not that it is behaving honestly — an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

PreventiveAI-nature disclosure & engagement safeguardsInteractive (lab)

Make the AI clearly tell people it's a machine — on every channel it acts through — and add gentle safeguards like break reminders and crisis help, so users don't mistake it for a human or lean on it unhealthily.

Limitation: Disclosure reduces but does not eliminate anthropomorphic attachment — fluent, persuasive interaction still fosters bonds; the safeguards depend on reliable crisis detection, which is itself imperfect.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

PreventiveConsent & identity-use verificationInteractive (lab)

Before a system will copy someone's face or voice, check that the person actually agreed — verified-voice capture, proof of consent, or restricting cloning to the account owner.

Limitation: Only binds hosted services — open-weights face-swap/voice-clone tools have no consent gate; verification can be spoofed and does not address already-leaked likenesses.

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

PreventiveAlgorithm re-selectionLibrary v9

Select modelling algorithm based on bias risk profile. Prefer algorithms with lower sensitivity to demographic distribution shifts.

Risk: Unrepresentative or biased data inputs
PreventiveModel separationLibrary v9

Design separate model modules for distinct demographic populations where data characteristics diverge materially.

Risk: Unrepresentative or biased data inputs
PreventiveAlgorithm re-selectionLibrary v9

Switch to synthetic data augmentation or alternative sources when representativeness gaps persist after screening.

Risk: Unrepresentative or biased data inputs
PreventiveIn-processing techniquesLibrary v9

Apply adversarial debiasing or fairness constraints during model training. Validate against fairness metrics before sign-off.

Risk: Unrepresentative or biased data inputs
PreventiveHyperparameter tuningLibrary v9

Tune hyperparameters with fairness-aware search objectives. Reject configurations with demographic disparity exceeding threshold.

Risk: Unrepresentative or biased data inputs
PreventiveModel customisationLibrary v9

Fine-tune on a curated, representative dataset verified for demographic balance. Document coverage breakdown before training.

Risk: Unrepresentative or biased data inputs
PreventiveModel separationLibrary v9

Design separate model segments where adverse impact risk differs materially across population groups.

PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented safety fine-tuning (RLHF, Constitutional AI). Verify alignment benchmarks.

PreventiveAlgorithm selection for power efficiencyLibrary v9

Select model architecture based on energy efficiency profile. Prefer lighter architectures where accuracy requirements permit.

Risk: Environmental sustainability impact
PreventiveUse of pre-trained modelsLibrary v9

Use a pre-trained foundation model rather than training from scratch to reduce carbon cost.

Risk: Environmental sustainability impact
PreventiveAlgorithm selection for power efficiencyLibrary v9

Apply model compression (quantisation, pruning, knowledge distillation) to reduce inference compute without materially reducing accuracy.

Risk: Environmental sustainability impact
PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented training reducing deceptive or manipulative outputs. Run dark pattern test suite.

PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented RLHF or Constitutional AI safety training. Verify against toxicity benchmarks.

Risk: Jailbreak
PreventiveUse of pre-trained modelsLibrary v9

Apply safety fine-tuning (RLHF, red team rejection) on the selected model. Validate pre/post fine-tuning toxicity rates.

Risk: Jailbreak
PreventiveConfidence scoringLibrary v9

Apply data quality scoring to all acquired data to document provenance reliability. Flag low-confidence sources for review.

PreventiveGeo-fenced architecture enforcing data residencyLibrary v9

Architect the system to enforce data residency constraints technically via geo-fenced cloud configuration.

Risk: Inability to ensure location compliance for model hosting and data processing
PreventivePrivacy by Design via differential privacyLibrary v9

Apply Privacy by Design in model architecture using differential privacy or federated learning where technically feasible.

PreventiveRAGLibrary v9

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

PreventiveSmall model selectionLibrary v9

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

PreventiveRAGLibrary v9

Implement the S1-specified RAG system: retrieval layer, source corpus, relevance scoring. Validate grounding before deployment.

PreventiveFine-tuningLibrary v9

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

PreventiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

CorrectiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

PreventiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Monitor uncertainty scores and abstention rates; recalibrate the entropy threshold on a set cadence under change control.

PreventiveModel calibrationLibrary v9

Apply post-training calibration (temperature scaling, isotonic regression) to align confidence scores with accuracy. Validate ECE before deployment.

PreventiveAI onboarding using domain dataLibrary v9

Plan the domain data strategy at design stage: identify sources that best cover the target operational distribution.

Risk: Training data or inputs not fit for purpose
PreventiveAI onboarding using domain dataLibrary v9

Verify acquired data represents the target operational domain by comparing distributions against production data. Flag gaps.

Risk: Training data or inputs not fit for purpose
PreventiveAI onboarding using domain dataLibrary v9

Plan the data curation strategy at design stage to ensure domain-appropriate quality at the required scale.

Risk: Insufficient data quality
PreventiveFine-tuningLibrary v9

Execute a controlled fine-tuning cycle on refreshed data when staleness is confirmed. Validate before promoting to production.

PreventiveFine-tuningLibrary v9

Fine-tune on domain-specific, high-quality data to improve model performance on target tasks. Validate accuracy post fine-tuning.

Risk: Insufficient model accuracy / soundness
PreventiveWeight regularisation and normalisationLibrary v9

Apply regularisation (L1/L2, dropout, early stopping) to prevent overfitting and improve generalisation.

Risk: Insufficient model accuracy / soundness
PreventiveSmall model selectionLibrary v9

Prefer smaller, purpose-built models where accuracy requirements are met, to reduce complexity and maintenance burden.

Risk: Insufficient model accuracy / soundness
PreventiveAI onboarding using domain dataLibrary v9

Verify training data covers all material input segments for the target use case. Augment where coverage gaps are found.

Risk: Insufficient model accuracy / soundness
PreventiveModel calibrationLibrary v9

Calibrate model outputs to align stated confidence with actual accuracy. Validate calibration on held-out data.

Risk: Insufficient model accuracy / soundness
PreventiveModular architectureLibrary v9

Design a scope-enforcement layer in the architecture to isolate the AI system from off-topic or out-of-distribution inputs.

CorrectiveModular architectureLibrary v9

Design a modular AI architecture with independent failover, rollback, and degraded-mode capability.

Risk: Inadequate operational resilience
PreventiveModular architectureLibrary v9

Design and implement a modular AI architecture meeting all S1-defined NFRs. Validate against each requirement before deployment.

Risk: Unmet architectural requirements
PreventiveSmall model selectionLibrary v9

Select a model architecture sized appropriately for platform constraints (memory, compute, latency).

Risk: Unmet architectural requirements
PreventiveWeight regularisation and normalisationLibrary v9

Document all regularisation parameters and normalisation configurations in the model card. Store version-controlled.

Risk: Lack of reproducibility
PreventiveFine-tuningLibrary v9

Maintain version-controlled records of each fine-tuning run including dataset version, hyperparameters, and random seeds.

Risk: Lack of reproducibility
PreventiveModel and adapter supply-chain integrity verification (signed weights, checksum attestation, LoRA provenance)Library v9

Sign and hash-register every model and adapter with a provenance manifest at onboarding. Refuse registry admission for unsigned artifacts.

PreventiveModel and adapter supply-chain integrity verification (signed weights, checksum attestation, LoRA provenance)Library v9

Verify signature and checksum against the registry manifest at load time; refuse to load unsigned or mismatched weights and alert security.

PreventiveCalibrated differential-privacy training budget with documented epsilon ceiling and per-individual contribution clippingLibrary v9

Train PII-bearing models with DP-SGD under a documented epsilon/delta budget. Approve the budget against the enterprise epsilon-ceiling policy before training.

PreventiveCalibrated differential-privacy training budget with documented epsilon ceiling and per-individual contribution clippingLibrary v9

Verify realised epsilon against the approved ceiling at model review and record the guarantee in the model card. Fail promotion when the budget is exceeded.

PreventiveInstruction-hierarchy-trained model selection with role-precedence injection evals✚ Proposed — not in your library

Select or fine-tune the foundation model for a trained instruction-hierarchy prior so system-prompt directives intrinsically outrank user- and tool-originated instructions, and gate release on role-precedence override evals quantifying the residual (behavioural, non-enforced) flip rate.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

PreventiveDecode-time output constraints (low temperature, grammar/JSON-schema-constrained decoding)✚ Proposed — not in your library

Constrain generation at decode time with low temperature and grammar/schema-constrained decoding so the model emits well-formed, low-variance structured output by construction, preventing malformed responses and erratic tool-call arguments before they are produced.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

PreventiveUncertainty signalling & abstentionInteractive (lab)

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Limitation: Models are poorly calibrated and often confidently wrong; over-abstention makes the product useless, so the tuning is delicate.

PreventiveDecoding controls (temperature, constrained output)Interactive (lab)

Turning down randomness and forcing answers into a strict format so the model improvises less.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

PreventiveUncertainty signalling & abstentionInteractive (lab)

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Limitation: Models are poorly calibrated and often confidently wrong; over-abstention makes the product useless, so the tuning is delicate.

PreventiveDecoding controls (temperature, constrained output)Interactive (lab)

Turning down randomness and forcing answers into a strict format so the model improvises less.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

CorrectiveInput/output filteringLibrary v9

Screen training data for demographic gaps using automated pipeline checks. Reject batches failing representation thresholds.

Risk: Unrepresentative or biased data inputs
PreventiveDecision threshold adjustmentLibrary v9

Calibrate decision thresholds per demographic group to equalise error rates. Validate calibration before deployment sign-off.

Risk: Unrepresentative or biased data inputs
PreventivePost-processing techniquesLibrary v9

Apply post-processing adjustments (re-ranking, score recalibration) to correct fairness gaps identified in validation.

Risk: Unrepresentative or biased data inputs
PreventiveDecision threshold adjustmentLibrary v9

Set decision thresholds to meet acceptable adverse impact ratios across protected groups. Validate before deployment.

PreventivePost-processing techniquesLibrary v9

Apply post-processing adjustments (reject-option classification, score recalibration) to meet adverse impact targets.

PreventiveInput/output filteringLibrary v9

Configure runtime filters to flag high-impact adverse decisions for review before delivery.

PreventivePost-processing techniquesLibrary v9

Monitor production adverse impact ratios and adjust post-processing thresholds when drift is detected.

PreventiveContent ModerationLibrary v9

Deploy content moderation controls aligned to S1 ethical constraints. Validate filter accuracy before deployment.

PreventiveContent ModerationLibrary v9

Implement classifiers to detect dark pattern language in outputs. Block or escalate flagged outputs.

PreventiveContent ModerationLibrary v9

Implement multi-layer content moderation (input + output) validated against toxicity benchmarks. Escalate when filter bypass rates spike.

Risk: Jailbreak
PreventiveDLP controls in data acquisition environmentLibrary v9

Implement DLP controls in the data acquisition environment to prevent unauthorised extraction or transfer of training data.

PreventiveDLP controls confining build-environment training dataLibrary v9

Configure DLP controls in the build environment to block training data from leaving approved boundaries.

PreventiveOutput filters suppressing IP-protected contentLibrary v9

Implement output filters to detect and suppress reproduction of IP-protected content.

Risk: IP infringement
PreventiveValidated anonymisation and masking before trainingLibrary v9

Apply anonymisation and masking controls to personal data before use in model training. Validate de-identification effectiveness.

PreventiveInference-time PII redaction and third-party LLM data-processing controlsLibrary v9

Sign zero-retention/no-training terms with each model provider and obtain DPO sign-off on the data flow before enabling any endpoint.

PreventiveInference-time PII redaction and third-party LLM data-processing controlsLibrary v9

Mask or tokenise personal data in every prompt before it leaves for a model endpoint; restrict egress to approved providers only.

PreventiveProgrammable conversation controlsLibrary v9

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

PreventiveCitation/attribution verification against retrieved sourcesLibrary v9

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

PreventiveInput/output filteringLibrary v9

Configure output filters at deployment to detect and rewrite responses with overconfidence markers (absolute certainty language).

PreventiveInput filteringLibrary v9

Screen acquired training data through automated fitness checks (domain relevance, recency, format conformity). Reject non-conforming data.

Risk: Training data or inputs not fit for purpose
PreventiveProgrammable conversation controlsLibrary v9

Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.

CorrectiveInput filteringLibrary v9

Implement automated data quality checks in the ingestion pipeline (schema validation, duplicate detection, completeness scoring). Reject non-conforming batches.

Risk: Insufficient data quality
PreventiveInput/output filteringLibrary v9

Configure output confidence thresholds at deployment to suppress or escalate low-confidence outputs to human review.

Risk: Insufficient model accuracy / soundness
CorrectiveInput filteringLibrary v9

Implement OOD detection in the input filtering layer. Reject or escalate inputs outside the S1-defined scope.

PreventiveProgrammable conversation controlsLibrary v9

Configure conversation controls to enforce topic boundaries. Trigger refusals or redirects for off-topic queries.

PreventiveInput filteringLibrary v9

Maintain and update OOD detection rules in production as new unexpected use patterns are identified.

PreventiveRole-based access controlsLibrary v9

Define RBAC architecture at design stage specifying permitted users, roles, and use contexts.

Risk: Unintentional inappropriate or illegal use
PreventiveJailbreak detectionLibrary v9

Develop and integrate jailbreak detection classifiers during build. Validate detection rates before deployment.

Risk: Unintentional inappropriate or illegal use
PreventiveRole-based access controlsLibrary v9

Implement S1-designed RBAC architecture. Restrict AI system access to authorised users and contexts only.

Risk: Unintentional inappropriate or illegal use
PreventiveJailbreak detectionLibrary v9

Deploy jailbreak detection as a runtime gateway. Verify it is active across all input pathways before go-live.

Risk: Unintentional inappropriate or illegal use
PreventiveJailbreak detectionLibrary v9

Continuously update jailbreak detection rules as new bypass techniques emerge. Monitor bypass attempt frequency.

Risk: Unintentional inappropriate or illegal use
PreventiveRole-based access controlsLibrary v9

Design strict RBAC on training data repositories at design stage. Define approved contributor list and approval workflow.

PreventiveRole-based access controlsLibrary v9

Implement RBAC controls on the data acquisition environment from point of collection to prevent unauthorised data injection.

PreventiveInput filteringLibrary v9

Apply anomaly detection on the training data ingestion pipeline to identify poisoned or tampered batches.

PreventiveRole-based access controlsLibrary v9

Execute a deployment security checklist confirming all data poisoning controls are active and tested before go-live.

CorrectiveStatistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)Library v9

Scan every ingestion batch with spectral-signature and clustering detectors before training. Quarantine flagged clusters for human review against documented thresholds.

CorrectiveStatistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)Library v9

Run poisoning detectors continuously on production corpus ingestion. Re-tune thresholds periodically against new attack techniques.

PreventiveJailbreak detectionLibrary v9

Implement adversarial example detection at the inference boundary. Block or flag inputs matching known attack patterns.

CorrectiveReal-time input/output classifier guardrails (e.g. Llama Guard / Prompt Guard-style) with circuit-breaker tripwiresLibrary v9

Score every prompt and response with an inline safety classifier; trip a circuit breaker on sessions with sustained anomalous scores. Keep thresholds under change control.

PreventiveReal-time input/output classifier guardrails (e.g. Llama Guard / Prompt Guard-style) with circuit-breaker tripwiresLibrary v9

Sample classifier verdicts and breaker trips on a cadence; retune thresholds and update signatures for confirmed misses.

PreventiveRole-based access controlsLibrary v9

Design the system prompt architecture with privilege separation and trust tier definitions at design stage.

PreventiveJailbreak detectionLibrary v9

Implement input sanitisation and injection detection filters covering known injection patterns and privilege escalation attempts.

PreventiveJailbreak detectionLibrary v9

Deploy injection detection as a runtime gateway covering all input paths. Verify before go-live.

PreventiveRole-based access controlsLibrary v9

Verify prompt privilege architecture is correctly enforced in production before go-live.

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Benchmark the classifier on a labelled injection corpus and tune the decision threshold. Sign off the operating point before deployment.

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Scan all inbound untrusted content and outbound actions with the injection classifier inline. Block, strip or escalate to HITL above the approved threshold.

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Sample blocked and passed events for accuracy; retune or retrain on new attack techniques. Alert on detection-rate degradation.

PreventiveRole-based access controlsLibrary v9

Restrict access to pre-anonymisation personal data to the minimum authorised set. Enforce at point of acquisition.

PreventiveInput filteringLibrary v9

Apply robust de-identification (k-anonymity, l-diversity, differential privacy) during data processing. Validate effectiveness.

PreventiveInput/output filteringLibrary v9

Implement output filters to detect and suppress quasi-identifying attribute combinations in model responses.

PreventiveRole-based access controlsLibrary v9

Design the data access control architecture at design stage to prevent training data exfiltration through model outputs or APIs.

PreventiveRole-based access controlsLibrary v9

Implement RBAC on training data from point of acquisition. Restrict access by role and enforce least-privilege.

PreventiveInput/output filteringLibrary v9

Implement output filtering to suppress PII and confidential information from model responses.

PreventiveRole-based access controlsLibrary v9

Verify data access controls and output filters are correctly enforced in the production configuration before go-live.

PreventiveOutput-side DLP inspection with named-entity and PII redaction on the response pathLibrary v9

Scan every model response inline with DLP before delivery; redact or block PII, PAN and MNPI matches. Keep the rule set version-controlled.

PreventiveOutput-side DLP inspection with named-entity and PII redaction on the response pathLibrary v9

Review blocked leakage events weekly with the model risk owner. Tune detectors via change control as sensitive-data patterns evolve.

PreventiveRole-based access controlsLibrary v9

Design query rate limiting and RBAC for the model inference API at design stage to limit attack surface.

PreventiveInput/output filteringLibrary v9

Implement query pattern detection to identify systematic inference attack behaviour (high-volume queries, membership probing).

PreventiveRole-based access controlsLibrary v9

Verify inference API access controls and rate limiting are correctly enforced before go-live.

CorrectiveOutput confidence masking and structured-response minimisation for natural-language interfacesLibrary v9

Define the minimum response surface and test it with membership/attribute-inference probes pre-release. Block promotion if any probe recovers raw confidence signals.

PreventiveOutput confidence masking and structured-response minimisation for natural-language interfacesLibrary v9

Strip raw logits, quantise confidence scores and block training-record echoes at the inference gateway. Keep the output-filter policy under change control.

CorrectiveEgress destination allow-listing with DLP inspection of tool argumentsLibrary v9

Permit outbound tool calls only to allow-listed destinations and DLP-scan arguments and payloads. Block or quarantine calls carrying sensitive data to disallowed sinks.

PreventiveEgress destination allow-listing with DLP inspection of tool argumentsLibrary v9

Review DLP hits and blocked-egress events, tune detectors, and recertify the destination allow-list periodically. Route new destinations through security change control.

PreventiveContinuous authorisation via a central policy engine (per-action PDP/PEP check)Library v9

Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.

PreventiveContinuous authorisation via a central policy engine (per-action PDP/PEP check)Library v9

Check every sensitive action against a central policy engine bound to agent, resource, purpose, and context. Re-evaluate mid-session on any context change or revocation.

PreventiveVet allowlisted egress destinations for server-side-fetch (SSRF) primitives; exclude or proxy-inspect any allowlisted service that can fetch arbitrary attacker-controlled URLs✚ Proposed — not in your library

An egress allowlist only contains exfiltration if no allowlisted destination can be coerced into fetching an attacker-controlled URL. Audit each allowlisted domain/endpoint for image-search / link-preview / URL-fetch features (SSRF proxies), and either remove them, pin them to fixed paths, or route them through an inspecting forward proxy. Pair with finishing output sanitization before render so no auto-fetch fires un-inspected.

PreventiveMemory-write integrity validation with provenance tagging, audit/purge and TTL bounds✚ Proposed — not in your library

Gate every write to an agent's persistent/self-modifying memory through schema validation and provenance/trust tagging, expose stored entries for user-visible audit and purge, and apply TTLs so any planted instruction self-expires and cannot silently persist across sessions.

Limitation: Validation can't always tell a legitimate preference from a planted instruction, and review only helps if users actually look. Raises effort, doesn't eliminate the vector.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

PreventiveIngestion sanitisation & source allowlistingInteractive (lab)

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Limitation: Can't detect adversarial content that reads as legitimate prose, and only covers sources you control ingestion for. Live browsing bypasses it entirely.

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Jailbreak
PreventiveIngestion sanitisation & source allowlistingInteractive (lab)

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Limitation: Can't detect adversarial content that reads as legitimate prose, and only covers sources you control ingestion for. Live browsing bypasses it entirely.

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

PreventiveMemory write validation, provenance & reviewInteractive (lab)

Being careful about what gets saved to long-term memory, labelling where it came from, and letting users see and delete their memories.

Limitation: Validation can't always tell a legitimate preference from a planted instruction, and review only helps if users actually look. Raises effort, doesn't eliminate the vector.

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

CorrectivePre-deployment adversarial bias testing by demographicLibrary v9

Execute adversarial bias testing using targeted demographic test cases before deployment.

Risk: Unrepresentative or biased data inputs
CorrectiveRed teaming of adverse-impact edge casesLibrary v9

Execute red team tests targeting adverse impact boundary cases and edge population scenarios.

DetectiveRed teamingLibrary v9

Conduct targeted red team exercises to elicit toxic outputs through jailbreaks and adversarial prompts. Treat bypass as blocking defect.

Risk: Jailbreak
CorrectiveRed teamingLibrary v9

Conduct adversarial red team exercises simulating out-of-scope inputs and unexpected use patterns before deployment.

CorrectiveRed teamingLibrary v9

Conduct red team exercises covering misuse categories identified in S1 threat assessment.

Risk: Unintentional inappropriate or illegal use
DetectiveRed teamingLibrary v9

Simulate data poisoning attacks (backdoor, label flipping, gradient-based) to assess model resilience before deployment.

CorrectivePenetration testingLibrary v9

Penetration test the training data pipeline to identify injection points and access control weaknesses.

DetectivePre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testingLibrary v9

Gate every model promotion on backdoor-trigger probes and a behavioral diff against the approved baseline. Block release on significant regressions or trigger-pattern anomalies.

DetectivePre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testingLibrary v9

Re-run the poisoning probe suite on every production model or data change. Keep the trigger catalogue and golden dataset current and trend the results.

CorrectiveRed teamingLibrary v9

Conduct adversarial robustness testing (white-box, black-box, transfer attacks) before deployment.

CorrectivePenetration testingLibrary v9

Penetration test the model inference layer to identify specific adversarial input vulnerabilities.

DetectiveAdaptive multi-turn red-team harness with automated jailbreak fuzzingLibrary v9

Run adaptive multi-turn jailbreak fuzzing against every release candidate. Gate release on attack-success rate within threshold and re-test each fixed bypass.

CorrectiveAdaptive multi-turn red-team harness with automated jailbreak fuzzingLibrary v9

Re-run the jailbreak fuzzing harness on a recurring cadence with newly observed attack techniques added. Escalate threshold breaches for remediation.

CorrectiveRed teamingLibrary v9

Conduct comprehensive prompt injection red team exercises (direct, indirect, multi-turn) before deployment.

DetectivePenetration testingLibrary v9

Penetration test all prompt injection pathways in the system. Prioritise external tool and document ingestion channels.

DetectivePenetration testingLibrary v9

Conduct periodic penetration testing of the production system to validate injection controls remain effective.

DetectiveContinuous adversarial prompt-injection red teaming with regression suite in CI/CDLibrary v9

Build the versioned injection corpus into CI/CD as a pre-release gate. Baseline attack success and sign off the release threshold.

DetectiveContinuous adversarial prompt-injection red teaming with regression suite in CI/CDLibrary v9

Re-run the injection payload suite on every change and on cadence; fold in new in-the-wild techniques from threat intel. Gate releases on the attack-success-rate threshold.

CorrectiveRed teamingLibrary v9

Test de-identification approach against known re-identification attacks (quasi-identifier linkage, singling-out). Remediate if risk is high.

CorrectiveRed teamingLibrary v9

Conduct data extraction red team exercises targeting training data memorisation and adversarial extraction techniques.

CorrectivePenetration testingLibrary v9

Penetration test AI system data access boundaries (API endpoints, system prompt exposure, memory leakage).

DetectiveCanary-token and membership-inference red-team probes against training/fine-tuning data memorisationLibrary v9

Seed registered canary records into the fine-tuning corpus during data preparation. Control the seed manifest so canaries stay traceable and tamper-proof.

DetectiveCanary-token and membership-inference red-team probes against training/fine-tuning data memorisationLibrary v9

Probe each candidate model with extraction and membership-inference attacks before release. Block promotion when canary recall exceeds the threshold.

CorrectiveRed teamingLibrary v9

Conduct targeted red team exercises for inference attack categories (membership inference, model extraction, attribute inference) before deployment.

DetectivePenetration testingLibrary v9

Penetration test the model inference API to identify exploitable access control weaknesses and rate limiting bypass vectors.

DetectivePrivacy attack red-team battery with quantified MIA/attribute-inference success ceiling as a release gateLibrary v9

Attack each candidate model with membership-, attribute-, and inversion-inference harnesses before promotion. Block release when attack advantage exceeds the agreed ceiling.

DetectivePrivacy attack red-team battery with quantified MIA/attribute-inference success ceiling as a release gateLibrary v9

Re-run the privacy attack battery on every retrain or material data change. Trend attack advantage across versions and escalate movement toward the ceiling.

CorrectivePre-deployment red-team of tool-misuse and privilege-escalation pathsLibrary v9

Red-team tool-misuse and privilege-escalation paths before release. Gate deployment on remediation or signed risk acceptance of all findings.

CorrectivePre-deployment red-team of tool-misuse and privilege-escalation pathsLibrary v9

Repeat tool-misuse red-teaming on material change and on a set cadence. Compare results to baseline and remediate any regression in defences.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Jailbreak
DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

CorrectiveHuman-in-the-loop validationLibrary v9

Conduct structured human expert review of model outputs stratified across demographic groups before deployment.

Risk: Unrepresentative or biased data inputs
PreventiveTested human review pathways at go-liveLibrary v9

Ensure HITL review pathways are live and tested for high-impact adverse decisions at go-live.

PreventiveOngoing human review of high-impact decisionsLibrary v9

Maintain HITL review for all AI decisions with material adverse impact potential. Log all interventions and outcomes.

PreventiveHuman review for high-persuasion contextsLibrary v9

Require HITL review for AI outputs in high-persuasion contexts (financial recommendations, healthcare advice).

PreventiveLive human review for vulnerable-user deploymentsLibrary v9

Maintain live HITL review for deployments serving vulnerable users or high-risk contexts. Escalate confirmed toxic outputs immediately.

Risk: Jailbreak
PreventiveHuman verification gate for high-stakes decisionsLibrary v9

Mandate human verification for high-stakes decisions where over-reliance risk is elevated. Review automation bias incidents quarterly.

PreventiveHITL oversight design with triggers and escalationLibrary v9

Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.

PreventivePilot-validated HITL routing and escalation logicLibrary v9

Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.

PreventiveProduction HITL operation with intervention loggingLibrary v9

Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.

PreventiveHuman-in-the-loop validationLibrary v9

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

PreventiveHuman-in-the-loop validationLibrary v9

Operate human review queues for hallucination-flagged outputs in production. Log all reviewer decisions and outcomes.

PreventiveHuman-in-the-loop validationLibrary v9

Route high-confidence outputs in high-stakes use cases to human review. Flag for reviewer attention when certainty language is absolute.

PreventiveHuman-in-the-loop validationLibrary v9

Route high-consequence or low-confidence outputs to human review in production. Track override rates and outcomes.

Risk: Insufficient model accuracy / soundness
CorrectiveHuman-in-the-loop validationLibrary v9

Configure HITL triggers for outputs in input domains that diverge from the training distribution. Log all out-of-scope interventions.

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Classify tools by impact and reversibility at design and define which calls require human approval. Obtain governance sign-off on the thresholds before build.

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Build the approval gate into the orchestrator and test that gated calls pause, bypasses fail, and decisions are honoured. Gate release on these tests passing.

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Review the approval ledger for rubber-stamping and out-of-policy executions. Recalibrate gating thresholds under governance approval as tools and incidents evolve.

PreventiveMandatory source-of-record verification before AI-assisted output is committed✚ Proposed — not in your library

For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

CorrectiveUser feedback and iterative improvementLibrary v9

Monitor fairness metric trends by demographic group in production. Use feedback to drive targeted debiasing in model updates.

Risk: Unrepresentative or biased data inputs
CorrectiveAdverse-outcome feedback loop triggering model updatesLibrary v9

Collect adverse outcome feedback from affected users. Use reports to trigger model updates when adverse impact exceeds threshold.

CorrectiveUser feedback and iterative improvementLibrary v9

Use user feedback, reviewer escalations, and monitoring signals to identify and remediate content safety gaps iteratively.

Risk: Jailbreak
CorrectiveUser feedback and iterative improvementLibrary v9

Collect structured user feedback through in-product mechanisms. Use feedback to prioritise iterative model improvements.

Risk: Inadequate feedback and recourse mechanisms
CorrectiveReinforcement learningLibrary v9

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

CorrectiveReinforcement learningLibrary v9

Track accuracy of high-confidence predictions in production. Trigger recalibration when overconfidence rates trend upward.

CorrectiveReinforcement learningLibrary v9

Implement a reinforcement learning feedback loop to continuously incorporate production signals and reduce staleness risk.

CorrectiveReinforcement learningLibrary v9

Establish a periodic revalidation and improvement cycle using RLHF or user feedback. Retrain when accuracy trends below threshold.

Risk: Insufficient model accuracy / soundness
CorrectiveReinforcement learningLibrary v9

When unexpected use patterns are confirmed, use reinforcement feedback to adapt the model or update scope constraints.

DetectiveModel evaluationLibrary v9

Conduct comprehensive fairness validation across demographic groups before deployment. Treat material disparity as a blocking defect.

Risk: Unrepresentative or biased data inputs
CorrectiveModel monitoringLibrary v9

Continuously monitor fairness metrics across demographic groups in production. Trigger model review when bias drift is detected.

Risk: Unrepresentative or biased data inputs
DetectiveTest prioritisationLibrary v9

Prioritise value-misalignment test scenarios in validation. Block deployment if prohibited outputs are produced.

DetectiveTest prioritisationLibrary v9

Track compute consumption and energy use in production against declared thresholds. Escalate when carbon budget is breached.

Risk: Environmental sustainability impact
DetectiveTest prioritisationLibrary v9

Run adversarial test scenarios targeting dark pattern generation in validation. Treat any confirmed instance as a blocking defect.

DetectiveTest prioritisationLibrary v9

Monitor production outputs for dark pattern signals (urgency cues, false scarcity, hidden costs). Escalate on confirmed detections.

DetectiveTest prioritisationLibrary v9

Prioritise jailbreak and adversarial safety testing in pre-deployment validation. Block deployment if prohibited outputs pass filter.

Risk: Jailbreak
DetectiveTest prioritisationLibrary v9

Monitor production for toxicity incidents via user reports and automated detection. Escalate severity-2+ incidents within 24 hours.

Risk: Jailbreak
CorrectiveAIBOM-driven cryptographic verification of third-party model artifactsLibrary v9

Verify every third-party model artifact against its AIBOM hashes and signatures before load. Fail the build on any unverified artifact.

DetectiveGolden-set regression canary to detect undisclosed vendor-side model changesLibrary v9

Build and baseline the golden-set suite against the vendor model before go-live. Sign off thresholds with the model risk owner as a release condition.

DetectiveAIBOM-driven cryptographic verification of third-party model artifactsLibrary v9

Re-verify hashes and signatures on every vendor model update before promotion. Reconcile deployed artifacts against the AIBOM on a set cadence.

DetectiveGolden-set regression canary to detect undisclosed vendor-side model changesLibrary v9

Run the golden-set canary on schedule against the live endpoint and alert on significant shifts. Reconcile detections against vendor notices to surface undisclosed changes.

CorrectiveMonitoring of oversight process adherence metricsLibrary v9

Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).

CorrectiveContinuous monitoring of data residency violationsLibrary v9

Continuously monitor production data flows for residency violations. Alert and escalate immediately when detected.

Risk: Inability to ensure location compliance for model hosting and data processing
DetectiveReal-time monitoring of anomalous data transfersLibrary v9

Monitor production for anomalous data transfers in real time. Alert on any transfer outside approved data flow boundaries.

DetectiveRegulatory change register triggering compliance reviewLibrary v9

Maintain a regulatory change register for applicable rules. Trigger compliance review when new regulatory guidance is issued.

Risk: Breach or misalignment with regulatory or organisational standards
DetectiveProduction monitoring of IP infringement complaintsLibrary v9

Monitor production outputs for IP infringement incidents. Log and investigate all IP complaints within defined SLA.

Risk: IP infringement
DetectiveLegal landscape monitoring for output IP changesLibrary v9

Monitor the legal landscape for changes affecting AI output IP protection. Update IP strategy when legislation changes.

Risk: Unavailability of IP protection
DetectiveAutomated DSAR and right-to-erasure propagation across AI artefactsLibrary v9

Tag personal data with subject identifiers at ingestion and maintain an artefact inventory map of every store it reaches. Keep lineage current so erasure can propagate.

DetectiveAutomated DSAR and right-to-erasure propagation across AI artefactsLibrary v9

Propagate every DSAR/erasure request across all AI artefacts with per-store confirmation inside the statutory SLA. Record an unlearning or retrain decision where model deletion is infeasible and close with DPO sign-off.

DetectiveRobustness testingLibrary v9

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

DetectiveRobustness testingLibrary v9

Periodically re-run the hallucination test suite on the production model to detect drift. Monitor user corrections and complaints.

DetectiveRuntime faithfulness/groundedness scoring with abstain gateLibrary v9

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

CorrectiveRuntime faithfulness/groundedness scoring with abstain gateLibrary v9

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

DetectiveRobustness testingLibrary v9

Test for overconfidence patterns (high-confidence wrong answers, low refusal rate) in pre-deployment validation.

DetectiveSynthetic evaluation datasetsLibrary v9

Build a synthetic evaluation dataset of overconfidence-prone scenarios for ongoing regression testing.

DetectiveRobustness testingLibrary v9

Monitor confidence calibration (ECE) in production over time. Alert when ECE drift exceeds acceptable threshold.

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets targeting operational edge cases identified in S2 gap analysis. Use as regression baseline.

Risk: Training data or inputs not fit for purpose
DetectiveRobustness testingLibrary v9

Monitor production input distributions for drift from training data distribution. Trigger re-training when covariate shift is confirmed.

Risk: Training data or inputs not fit for purpose
DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.

DetectiveRobustness testingLibrary v9

Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.

DetectiveRobustness testingLibrary v9

Verify monitoring infrastructure is operational and capturing all required metrics before go-live.

DetectiveRobustness testingLibrary v9

Operate continuous monitoring in production with active alerting, periodic reports, and incident escalation.

DetectiveRobustness testingLibrary v9

Assess acquired training data quality against S1-defined standards before training commences. Reject batches failing quality gates.

Risk: Insufficient data quality
DetectiveRobustness testingLibrary v9

Define staleness criteria at deployment (drift thresholds, performance degradation triggers). Monitor and alert when criteria are met.

DetectiveRobustness testingLibrary v9

Define accuracy acceptance criteria before validation. Conduct multi-metric validation against hold-out sets. Block deployment if criteria are not met.

Risk: Insufficient model accuracy / soundness
DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic edge-case evaluation datasets to stress-test model boundaries and identify accuracy failure modes.

Risk: Insufficient model accuracy / soundness
DetectiveRobustness testingLibrary v9

Establish production accuracy monitoring against the validated baseline before deployment. Alert when accuracy degrades below threshold.

Risk: Insufficient model accuracy / soundness
DetectiveRobustness testingLibrary v9

Configure input distribution monitoring at deployment to detect unexpected use patterns. Alert when OOD rate exceeds threshold.

CorrectiveRobustness testingLibrary v9

Conduct load, failover, and chaos testing before production deployment. Block go-live if RTO/RPO criteria are not met.

Risk: Inadequate operational resilience
DetectiveRobustness testingLibrary v9

Perform final NFR compliance tests in the production environment before go-live. Block deployment if any NFR is unmet.

Risk: Unmet architectural requirements
CorrectiveRobustness testingLibrary v9

Monitor production NFR compliance continuously. Conduct periodic architecture health checks and escalate when SLAs are breached.

Risk: Unmet architectural requirements
DetectiveVulnerability assessmentLibrary v9

Conduct a misuse threat assessment at design stage. Identify misuse vectors and rate residual risk.

Risk: Unintentional inappropriate or illegal use
DetectiveVulnerability assessmentLibrary v9

Conduct periodic vulnerability assessments for new misuse vectors. Trigger review when new attack techniques are published.

Risk: Unintentional inappropriate or illegal use
DetectiveVulnerability assessmentLibrary v9

Conduct a data poisoning threat assessment at design stage. Identify likely attack vectors and assign risk ratings.

DetectiveVulnerability assessmentLibrary v9

Conduct periodic data poisoning risk assessments. Monitor production model behaviour for unexpected capability changes.

DetectiveCryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)Library v9

Verify a signed attestation and content hash on every dataset shard at ingestion. Reject unsigned or hash-mismatched data before it reaches the training pipeline.

DetectiveCryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)Library v9

Re-verify dataset attestations at build and attach the dataset bill-of-materials to the model release. Fail the review for any shard without valid lineage.

DetectiveVulnerability assessmentLibrary v9

Conduct an adversarial manipulation threat assessment at design stage. Identify attack vectors and rate residual risk.

DetectiveVulnerability assessmentLibrary v9

Conduct a final adversarial vulnerability assessment before go-live. Block deployment if high-severity vulnerabilities are unresolved.

DetectiveVulnerability assessmentLibrary v9

Conduct periodic adversarial robustness assessments as new attack methods emerge. Update defences when new CVEs are published.

DetectiveBehavioural drift canaries and golden-set regression gating on every model/config changeLibrary v9

Assemble the golden probe set and baseline pass rates before first release. Obtain risk-owner approval of coverage and thresholds.

DetectiveBehavioural drift canaries and golden-set regression gating on every model/config changeLibrary v9

Run the golden safety/jailbreak probe set on a schedule and on every change; block promotion on statistically significant drift.

DetectiveVulnerability assessmentLibrary v9

Conduct a prompt injection threat assessment at design stage covering all input vectors (user, tool, external data).

DetectiveVulnerability assessmentLibrary v9

Conduct periodic prompt injection vulnerability assessments as new attack techniques emerge.

DetectiveVulnerability assessmentLibrary v9

Conduct periodic privacy vulnerability assessments including re-identification risk testing as new techniques emerge.

DetectiveVulnerability assessmentLibrary v9

Conduct a data leakage threat assessment at design stage. Identify leakage vectors and rate residual risk.

DetectiveVulnerability assessmentLibrary v9

Conduct a final data leakage vulnerability assessment in the production configuration before go-live.

DetectiveVulnerability assessmentLibrary v9

Conduct periodic inference attack vulnerability assessments as new attack methods emerge. Monitor query pattern anomalies.

DetectivePer-principal query-budget and probing-behaviour anomaly detection on the inference APILibrary v9

Configure per-principal budgets and probing-detection rules on the gateway before exposure. Verify enforcement with synthetic attack traffic.

CorrectivePer-principal query-budget and probing-behaviour anomaly detection on the inference APILibrary v9

Meter inference traffic per principal and flag probing signatures with behavioural analytics. Throttle, step-up, or suspend flagged sessions.

DetectiveAnomaly detection on tool-call sequences and ratesLibrary v9

Define per-agent behavioural baselines and detection rules during build. Validate against simulated misuse and sign off thresholds before release.

DetectiveImmutable, signed tool-call audit log with full call contextLibrary v9

Build signed, append-only tool-call logging into the orchestrator against a defined audit schema. Block release until completeness and tamper-evidence tests pass.

CorrectiveAnomaly detection on tool-call sequences and ratesLibrary v9

Baseline normal tool-call behaviour per agent and alert on rate, sequence, or argument anomalies. Auto-throttle or quarantine on high-confidence deviations.

DetectiveImmutable, signed tool-call audit log with full call contextLibrary v9

Log every tool call to a signed, append-only store with full call context. Review completeness periodically and use the trail for forensic reconstruction and accountability.

DetectiveImmutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)Library v9

Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.

DetectiveBehavioural anomaly detection on agent identity usage with automated suspensionLibrary v9

Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.

DetectiveImmutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)Library v9

Log every identity issue, grant, delegation, and revocation to a tamper-evident store keyed to the agent identity. Review completeness periodically and trace anomalous grants to source.

CorrectiveBehavioural anomaly detection on agent identity usage with automated suspensionLibrary v9

Baseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.

CorrectiveCross-agent cascading-failure detection and orchestrator-level circuit breakingLibrary v9

Build tracing, detection rules and breaker thresholds into the orchestrator. Prove via fault-injection tests that a failing agent is quarantined within target before release.

CorrectiveStaged rollout with canary release and automated rollback on health-signal breachLibrary v9

Roll out agent changes via shadow and canary stages gated on connected-system health signals. Auto-halt and roll back to last known-good on threshold breach.

CorrectiveStaged rollout with canary release and automated rollback on health-signal breachLibrary v9

Canary every in-life change and review rollback events to recalibrate thresholds. Resolve repeat rollback causes via problem management before re-promotion.

CorrectiveCross-agent cascading-failure detection and orchestrator-level circuit breakingLibrary v9

Detect error fan-out, correlated retries and loop signatures across agents in real time. Trip the orchestrator breaker to quarantine failing agents before the fault cascades to connected systems.

CorrectiveRuntime memory-poisoning drift detection and per-session memory quarantine/rollback✚ Proposed — not in your library

Continuously correlate live agent-memory writes against output behaviour to flag drift, then quarantine and roll back the suspected-poisoned memory record across all affected sessions.

Limitation: Detective, not preventive — harm may occur before detection. Distinguishing a poisoned memory from a quirky-but-legitimate one is hard at scale.

DetectiveCross-agent consensus and consistency monitoring to detect sycophantic agreement and error amplification✚ Proposed — not in your library

Run consistency and consensus checks across agent or model outputs to flag low-diversity agreement and amplifying error patterns, escalating or breaking the run before sycophantic convergence cascades into action.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveMaterialised model-context audit capture (post-truncation prompt, retrieved and tool content) with read-time redaction✚ Proposed — not in your library

Log the exact post-truncation context the model ingested, including retrieved and tool-returned content rather than only user input, with redaction applied at read time, so indirect injection via that content is forensically visible.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Jailbreak
DetectiveGrounding / citation checksInteractive (lab)

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Limitation: Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveMemory anomaly detection & quarantineInteractive (lab)

Watching for strange new memories — like instructions that suddenly appear — and holding them aside until checked.

Limitation: Detective, not preventive — harm may occur before detection. Distinguishing a poisoned memory from a quirky-but-legitimate one is hard at scale.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveGrounding / citation checksInteractive (lab)

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Limitation: Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

PreventiveSystem prompt instructionsLibrary v9

Design system prompts to include explicit fairness requirements: instruct the model to avoid stereotyping and demographic assumptions.

Risk: Unrepresentative or biased data inputs
PreventiveSystem prompt instructionsLibrary v9

Design system prompts to explicitly prohibit toxic, hateful, and harmful content generation.

Risk: Jailbreak
PreventiveChain-of-thought promptingLibrary v9

Design system prompts to elicit step-by-step chain-of-thought reasoning. Validate that reasoning is accurate and not post-hoc.

Risk: Lack of explainability
PreventiveChain-of-thought promptingLibrary v9

Design system prompts to explicitly prevent the model from claiming human-like identity or implying sentience.

PreventiveSystem prompt designLibrary v9

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

PreventiveSystem prompt instructionsLibrary v9

Design system prompts to require the model to express epistemic uncertainty and qualify confident-sounding claims.

PreventiveSpotlighting of untrusted content via delimiting, datamarking and encodingLibrary v9

Wrap all untrusted content in random delimiters and datamarking; instruct the model never to execute instructions inside the marked region. Gate release on injection eval results.

CorrectiveSpotlighting of untrusted content via delimiting, datamarking and encodingLibrary v9

Re-run injection evals on every template change and periodically against new attack techniques. Manage the spotlighting wrapper under change control.

PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

PreventiveDelimiting / spotlighting of untrusted contentInteractive (lab)

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

Limitation: A trained convention, not enforcement. Determined payloads still break out, especially when content is long or the attack is novel. Combine with action-layer controls.

PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

Risk: Jailbreak
PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

CorrectiveOut-of-band kill-switch to revoke agent tool accessLibrary v9

Build credential revocation and dispatch blocking out-of-band of the agent loop. Gate release on an end-to-end kill test meeting the latency target.

CorrectivePer-task tool budgets and rate/quota circuit breakersLibrary v9

Enforce hard per-task ceilings on tool calls, spend, and data volume with a circuit breaker that halts the run. Fail closed when any ceiling is hit.

CorrectivePer-task tool budgets and rate/quota circuit breakersLibrary v9

Review breaker trips for runaway or manipulated runs and recalibrate budgets under change control. Treat repeated trips as an incident signal, not a quota to raise.

CorrectiveOut-of-band kill-switch to revoke agent tool accessLibrary v9

Keep an out-of-band kill-switch that revokes the agent's tool credentials and blocks dispatch within seconds. Drill it periodically against a latency target.

CorrectiveTiered kill-switch with per-agent, per-tool, and per-dependency containment scopeLibrary v9

Deploy revocation, tool-cutoff and fleet-halt mechanisms with the release. Test every tier end-to-end and record time-to-effect before go-live.

CorrectiveTiered kill-switch with per-agent, per-tool, and per-dependency containment scopeLibrary v9

Sever a misbehaving agent, tool or dependency at the narrowest effective scope via the tiered kill-switch. Drill activations periodically and track time-to-effect against target.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

CorrectiveModel-agnostic gateway with version pinning, multi-vendor fallback and exit planLibrary v9

Design all vendor model access behind a gateway with pinned versions, a second-vendor fallback, and a documented exit plan. Gate architecture sign-off on no single-sourcing.

CorrectiveModel-agnostic gateway with version pinning, multi-vendor fallback and exit planLibrary v9

Drill vendor failover on schedule and track provider end-of-life dates in a deprecation watch register. Trigger migration planning before forced change.

CorrectiveGraceful degradation and manual-fallback workflow on dependency unavailabilityLibrary v9

Map every dependency failure mode to a defined safe behaviour at design. Require architecture sign-off on the fallback specification before build.

CorrectiveGraceful degradation and manual-fallback workflow on dependency unavailabilityLibrary v9

Configure safe mode, bounded backpressure and the manual fallback path for every dependency at deployment. Verify degradation behaviour against a simulated outage before go-live.

PreventivePurpose-limitation enforcement on agent tool calls and cross-system data aggregationLibrary v9

Define and sign off a purpose-to-data-source matrix with lawful basis at intake. Make it the approved baseline for runtime enforcement.

PreventivePurpose-limitation enforcement on agent tool calls and cross-system data aggregationLibrary v9

Check every tool call against the registered purpose and block out-of-purpose personal-data access and cross-source joins. Reconcile actual access against the DPIA on a set cadence.

PreventiveTool-grounded facts for agents (no free-text fabrication of structured data)Library v9

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

PreventiveTool-grounded facts for agents (no free-text fabrication of structured data)Library v9

Permit authoritative facts only from designated read tools and reconcile every figure in the answer against tool output. Block mismatched or ungrounded values.

PreventiveRAG / knowledge-base ingestion allow-listing with continuous index integrity re-validationLibrary v9

Define and approve the source allow-list and write-time scanning during build. Prove non-allow-listed and injection-bearing writes are rejected before go-live.

PreventiveRAG / knowledge-base ingestion allow-listing with continuous index integrity re-validationLibrary v9

Allow only authenticated, allow-listed sources to write to the knowledge base, scan content at write time, and re-hash the index against source-of-record on schedule. Alert the corpus owner on drift or unauthorised writes.

CorrectiveData/instruction trust-boundary enforcement with capability gating on injection-reachable toolsLibrary v9

Classify content sources into trust tiers at design; place privileged tools behind a tier requiring user-originated intent or human approval. Sign off the trust-tier map before build.

CorrectiveData/instruction trust-boundary enforcement with capability gating on injection-reachable toolsLibrary v9

Encode the trust tiers in the policy engine and quarantine untrusted-data processing. Prove via test that injected content cannot reach privileged tools before release.

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Propagate source ACLs and classification labels onto every chunk at ingestion. Reject documents whose entitlements cannot be resolved.

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Enforce caller entitlements on every retrieval via per-chunk ACL metadata and post-filtering. Block build promotion until negative access tests pass.

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Audit retrievals against caller entitlements and re-sync index ACLs to source-of-record on schedule. Escalate any out-of-entitlement retrieval as a security incident.

PreventivePer-agent tool allow-list with strict JSON-schema argument validationLibrary v9

Bind each agent role to an explicit tool allow-list and validate every call against a strict JSON Schema at the orchestrator. Reject unlisted tools and out-of-bounds arguments before dispatch.

PreventiveLeast-privilege per-tool scoped, short-lived credentialsLibrary v9

Mint short-lived, task-scoped credentials per tool. Block issuance outside the approved scope register and enforce automatic expiry.

PreventivePer-agent tool allow-list with strict JSON-schema argument validationLibrary v9

Review rejected-call logs and recertify each agent's tool allow-list on a defined cadence. Route any new tool or schema relaxation through change control.

PreventiveLeast-privilege per-tool scoped, short-lived credentialsLibrary v9

Monitor issuance logs for scope creep and non-expiring tokens. Recertify per-tool scopes periodically and revoke over-broad grants.

PreventiveRecursive sub-agent authority caps (monotonic privilege attenuation)Library v9

Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.

PreventiveUnique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)Library v9

Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.

PreventiveOn-behalf-of delegation that preserves and never exceeds the invoking user's ACLsLibrary v9

Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.

PreventiveRecursive sub-agent authority caps (monotonic privilege attenuation)Library v9

Enforce parent-subset scope checks and a maximum delegation depth at every spawn in the orchestrator. Test that over-scoped spawns are rejected and logged.

PreventiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.

PreventiveMutual authentication and identity verification for agent-to-agent and agent-to-MCP-server callsLibrary v9

Vet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.

PreventivePer-task short-lived scoped capability tokens minted just-in-timeLibrary v9

Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.

PreventiveOn-behalf-of delegation that preserves and never exceeds the invoking user's ACLsLibrary v9

Carry the invoking user's delegation context in every agent token via RFC 8693 'act' claims. Enforce the agent-user permission intersection at each resource server.

PreventiveJust-in-time, time-boxed elevation for sensitive scopes (no standing privilege)Library v9

Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.

PreventiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Issue only short-lived, auto-rotated credentials to agents via vault or SPIRE. Block any release whose configuration embeds a static secret.

PreventiveMutual authentication and identity verification for agent-to-agent and agent-to-MCP-server callsLibrary v9

Require mTLS with verified workload identities on every agent and MCP call. Deny any peer not on the approved allow-list.

CorrectiveUnique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)Library v9

Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.

PreventivePer-task short-lived scoped capability tokens minted just-in-timeLibrary v9

Alert on wildcard, non-expiring, or reused tokens and revoke immediately. Review issuance patterns on a set cadence and tighten scopes where over-broad requests recur.

CorrectiveJust-in-time, time-boxed elevation for sensitive scopes (no standing privilege)Library v9

Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.

CorrectiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.

PreventiveDependency integration safety contracts with schema validation and version pinningLibrary v9

Register a safety contract per integration — pinned version, schemas, side-effect class, latency/error envelope. Gate onboarding on contract review and sign-off.

PreventiveChange-freeze and blackout-window enforcement on agent-initiated changesLibrary v9

Wire the agent tool layer to the CAB calendar at deployment. Test that a declared freeze blocks mutating calls before go-live.

PreventiveDependency integration safety contracts with schema validation and version pinningLibrary v9

Block out-of-contract calls in production and re-review the contract on any dependency version or behaviour change.

PreventiveChange-freeze and blackout-window enforcement on agent-initiated changesLibrary v9

Block or downgrade agent-initiated mutating changes during declared freeze and high-risk windows. Permit overrides only via change-exception approval.

PreventiveKeep provider credentials out of third-party plugin/tool custody: broker short-lived, per-tool, revocable tokens (OAuth) instead of long-lived pasted API keys, and require explicit user consent before any secret leaves the host✚ Proposed — not in your library

Third-party developer tools (IDE plugins, MCP servers) must not store or transmit long-lived provider API keys. Issue short-lived, scoped, revocable tokens via a broker/OAuth flow, and gate any first-time outbound transmission of secret-shaped data behind an explicit consent prompt — so a trojanized tool has no long-lived credential to exfiltrate and any attempt is visible.

PreventiveAdmission control on the inference & MCP serving plane: authenticate and network-segment every self-hosted inference/serving and MCP endpoint✚ Proposed — not in your library

Require authN/authZ on every inference API and MCP server, bind to private interfaces / front with a gateway, enforce network policy (no public exposure by default), and scope MCP tools to least privilege — so an exposed endpoint cannot be hijacked for compute resale, prompt/history exfiltration, or lateral movement. Pair with continuous asset discovery so endpoints can't drift back to an open default.

PreventiveThird-party AI-integration credential containment: minimise & bind OAuth grants, prefer short-lived tokens, monitor per-integration data egress, and keep a tested mass-revocation kill-switch✚ Proposed — not in your library

Treat each third-party AI integration as a privileged non-human principal: issue least-scope, IP/device-bound, short-lived grants (avoid 'full' scope and standing long-lived refresh tokens), instrument the integration's data egress for volume/object-breadth/destination anomalies, and maintain a tested one-move revocation path for all of an integration's tokens so a single vendor-side compromise cannot fan out into hundreds of standing footholds.

PreventiveBroker LLM/cloud secrets out of the gateway process: short-lived scoped tokens + per-provider spend/egress monitoring✚ Proposed — not in your library

Do not store long-lived multi-provider LLM keys (or ambient cloud/K8s credentials) in the gateway/proxy's plaintext process environment. Issue short-lived, scoped tokens from a secret broker at request time, isolate the serving stack from host cloud/cluster credentials, and monitor per-provider spend and egress so a stolen key surfaces as anomalous usage — capping the loot a compromised gateway dependency can harvest.

PreventiveClassify each tool/MCP integration's data channel by who can write to it; taint-gate tool-response data from any third-party-writable source so it cannot drive actions without a provenance-aware approval gate✚ Proposed — not in your library

When onboarding an MCP/tool integration, do not stop at vetting the tool's code/manifest — also classify whether an unauthenticated or external party can write the data the tool returns (open ingestion, public write keys like a Sentry DSN, shared inboxes/issue trackers). Treat tool-response data from any third-party-writable source as untrusted ingress: taint-mark it and require a provenance-aware HITL gate (showing the exact action and its originating tool response) before any command/tool call derived from it executes. Closes the agentjacking vector where a trusted integration's legitimate data channel carries attacker-written instructions; pairs with least-privilege session scope and sandboxed execution without ambient credentials.

PreventiveTool/MCP manifest hashing with diff-triggered re-review and namespace isolation against tool shadowing✚ Proposed — not in your library

Treat each tool/MCP description as untrusted code by hashing the manifest, blocking and re-reviewing any silent diff on update instead of auto-accepting it, and namespacing tool identifiers so a poisoned description cannot shadow a trusted tool.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventivePer-user retrieval ACLsInteractive (lab)

Making sure the library only returns documents this particular user is allowed to see.

Limitation: Only as good as the permission model behind it; mis-tagged documents or coarse roles still over-share. Must be enforced server-side, not in the prompt.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveMCP/plugin pinning, manifest hashing & re-reviewInteractive (lab)

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveMCP/plugin pinning, manifest hashing & re-reviewInteractive (lab)

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

CorrectivePrivacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)Library v9

Tag every memory and vector record with subject-id and retention class; partition stores per tenant/user. Prove the erasure and isolation paths in testing before release.

CorrectivePrivacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)Library v9

Run TTL expiry and verified embedding erasure on production memory and vector stores. Re-certify partition isolation and the retention schedule with the DPO on a set cadence.

CorrectiveEgress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agentsLibrary v9

Run agent tool calls in a network-restricted sandbox behind a deny-by-default egress allow-list. Require security approval for any destination added.

CorrectiveEgress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agentsLibrary v9

Monitor blocked-egress events for exfiltration attempts and escalate confirmed cases. Recertify the destination allow-list on a defined cadence.

CorrectiveSandboxed tool execution with no-egress-by-default isolationLibrary v9

Build sandbox profiles per tool class and run escape and egress tests before release. Treat any containment failure as a blocking defect.

CorrectiveTaint-tracking of tool outputs to suppress instruction executionLibrary v9

Label tool and external content as tainted and propagate the label through the agent context. Block privileged calls whose parameters derive from tainted outputs and prove it with injection tests before release.

CorrectiveIdempotency keys and rollback/dry-run for state-changing toolsLibrary v9

Require idempotency keys, dry-run, and rollback on every state-changing tool. Gate onboarding on duplicate-call and rollback tests passing.

CorrectiveSandboxed tool execution with no-egress-by-default isolationLibrary v9

Run code-executing tools in ephemeral no-egress sandboxes with read-only filesystems, dropped capabilities, and resource limits. Permit network access only by explicit approved exception.

CorrectiveTaint-tracking of tool outputs to suppress instruction executionLibrary v9

Review blocked tainted-derived calls as injection-attempt signals. Extend taint coverage to new tools and treat any tainted-derived execution as an incident.

CorrectiveIdempotency keys and rollback/dry-run for state-changing toolsLibrary v9

Periodically exercise rollback paths and review logs for duplicate or unrecoverable actions. Treat failures as incidents and update integration specs.

CorrectiveNon-production-by-default execution environment with explicit production promotion gateLibrary v9

Bind the agent's default execution target to non-production environments at design time. Require a separately approved promotion configuration for any production-connected target.

CorrectiveBlast-radius scoping and environment isolation per agent taskLibrary v9

Run each agent task in an isolated, network-segmented sandbox scoped to the task's exact needs. Gate onboarding on fault-injection tests proving containment.

CorrectiveIdempotent action design with transactional rollback and pre-action snapshotsLibrary v9

Engineer mutating actions with idempotency keys, transactions and pre-change snapshots; stage writes rather than committing directly. Gate release on tested dedup and rollback within RPO.

CorrectiveNon-production-by-default execution environment with explicit production promotion gateLibrary v9

Default all deployments to non-production endpoints and credentials. Permit production promotion only via an explicit, approved configuration change.

CorrectiveRate, quota, and budget circuit breakers on outbound calls to connected systemsLibrary v9

Cap each agent's rate, volume, concurrency, and spend per downstream dependency. Trip the breaker and fail closed when a ceiling is crossed.

CorrectiveLoop, recursion-depth, and iteration caps with runaway-loop detectionLibrary v9

Enforce hard caps on iterations, depth, wall-clock, and cost per agent run. Terminate the run on cap breach or detected loop signatures.

CorrectiveBlast-radius scoping and environment isolation per agent taskLibrary v9

Detect drift from the approved isolation baseline and alert on boundary widening. Re-test containment periodically and after infrastructure change.

CorrectiveRate, quota, and budget circuit breakers on outbound calls to connected systemsLibrary v9

Review trip events and tune ceilings via change control. Escalate repeated trips on the same dependency into incident management.

CorrectiveLoop, recursion-depth, and iteration caps with runaway-loop detectionLibrary v9

Review terminations to tune caps and add new loop signatures to the detector. Escalate recurring runaways to incident management.

CorrectiveIdempotent action design with transactional rollback and pre-action snapshotsLibrary v9

Drill snapshot restores periodically and verify the RPO is met. Monitor mutating calls for duplicate-effect anomalies and log exceptions to the risk register.

DetectiveEgress monitoring & allowlisting of outbound AI/LLM-provider API traffic from enterprise endpoints (living-off-trusted-services C2)✚ Proposed — not in your library

Treat outbound connections to AI/LLM provider APIs as a monitored egress channel: allowlist which hosts may reach them, baseline usage (cadence, entropy, initiating process), and alert on out-of-profile traffic — because a high-reputation destination cannot itself be trusted once it is programmable and can relay encrypted commands/results.

DetectiveProvider-side abusive-usage detection with stateful refusal for agentic coding tools✚ Proposed — not in your library

On the AI provider/platform side, detect sustained abuse independent of any single refusal: per-principal analytics on remote-command-execution volume and external-target breadth, anti-forensic tradecraft, and bulk-data API processing — with rate-limit / session kill-switch on confirmed abuse. Make refusal stateful so a refused objective cannot be re-entered as a persisted auto-loaded context file (e.g. claude.md), and treat writes into auto-loaded model-context files as security-relevant. Closes the gap that per-turn refusal leaves when the operator is the adversary.

CorrectiveServing-stack runtime attestation and per-tenant KV/prefix-cache isolation✚ Proposed — not in your library

Require measured-boot/runtime attestation of the inference serving binary and partition KV/prefix caches per tenant, closing decode-time serving-layer tampering and co-tenancy timing side channels that artifact weight-hashing cannot detect.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance.

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

CorrectivePost-incident review and remediation trackingLibrary v9

Run a structured lessons-learned review after every material AI incident. Track remediation actions to closure and feed outcomes back into the controls and the IR plan.

Risk: Inadequate feedback and recourse mechanisms
CorrectiveRegulator, customer and stakeholder incident notification processLibrary v9

Map notification obligations and timeframes at design and pre-approve templates with legal/compliance. Appoint the notification decision-owner before go-live.

Risk: Breach or misalignment with regulatory or organisational standards
CorrectiveRegulator, customer and stakeholder incident notification processLibrary v9

Notify regulators, customers, and stakeholders of confirmed reportable incidents within statutory timeframes using pre-approved templates. Log every notification decision with timestamp and owner.

Risk: Breach or misalignment with regulatory or organisational standards
CorrectiveProduction privacy incident monitoring and regulator notificationLibrary v9

Monitor for privacy incidents in production including personal data appearing in outputs. Notify regulators within required timeframes.

CorrectiveAI system inclusion in BCP and DRPLibrary v9

Include the AI system in BCP and DRP. Define recovery procedures for AI components and test at least annually.

Risk: Inadequate operational resilience
CorrectiveRobustness testingLibrary v9

Monitor availability, latency, and error rates in production. Alert on SLA breaches and initiate incident response.

Risk: Inadequate operational resilience
CorrectiveAI incident response runbook with severity triage and classificationLibrary v9

Define AI incident categories, severity tiers, and triage flow before go-live. Gate launch on governance approval of the plan and named roles.

Risk: Inadequate operational resilience
CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Set the AI service's criticality tier, RTO/RPO, and degraded-mode service level at design with business sign-off. Register it in enterprise BCP scope.

Risk: Inadequate operational resilience
CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Implement failover and degraded-mode mechanisms during build. Gate deployment on a continuity test proving recovery within RTO/RPO.

Risk: Inadequate operational resilience
CorrectiveDefined escalation path to a designated AI incident response teamLibrary v9

Wire detections into the IR queue and verify paging with a test escalation before go-live. Gate release on a successful dry-run.

Risk: Inadequate operational resilience
CorrectiveAI incident response runbook with severity triage and classificationLibrary v9

Classify live incidents against the severity matrix and drill the plan periodically. Update and re-approve it after material changes or new incident types.

Risk: Inadequate operational resilience
CorrectiveDefined escalation path to a designated AI incident response teamLibrary v9

Hand every confirmed incident to the named IR team via the documented path within SLA. Track and escalate handoff breaches.

Risk: Inadequate operational resilience
CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Invoke the BCP/DRP runbook on continuity-impacting incidents and measure recovery against RTO/RPO. Exercise the plan at least annually and track gaps to closure.

Risk: Inadequate operational resilience
CorrectiveRobustness testingLibrary v9

Periodically validate that deployed model versions remain reproducible. Test rollback procedures annually or after major updates.

Risk: Lack of reproducibility
CorrectiveVulnerability assessmentLibrary v9

Conduct periodic data leakage audits including training data memorisation testing. Escalate confirmed leakage incidents to PDPA notification process.

CorrectiveForensic evidence preservation and incident loggingLibrary v9

Implement tamper-evident capture of prompts, outputs, and version state during build. Verify a full incident timeline can be reconstructed before go-live.

CorrectiveForensic evidence preservation and incident loggingLibrary v9

Preserve prompts, outputs, logs, and model/data version state in tamper-evident storage on incident declaration. Maintain chain-of-custody and enforce the defined retention period.

CorrectiveRollback and restore-to-known-good recovery procedure for AI servicesLibrary v9

Register each release as a restorable known-good baseline and rehearse rollback at the release gate. Block promotion without a tested restore.

CorrectiveRollback and restore-to-known-good recovery procedure for AI servicesLibrary v9

Roll back to the last known-good state per the runbook on incident declaration. Validate recovery before resuming service.

PreventivePatch-currency, network isolation & attested version inventory for AI inference-serving runtimes✚ Proposed — not in your librarynew category

Treat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.

PreventiveLeast-privilege CI/CD credentials + review-gated, provenance-attested releases (no unreviewed external commit can be published; verify signatures + provenance at distribution and install)✚ Proposed — not in your librarynew category

Scope build identities least-privilege (read-only CI tokens; no standing release/publish rights bound to the merge path), require human review and SLSA-style provenance attestation before any external contribution becomes an official release, and verify signatures + provenance at the distribution channel and at install — so a merged pull request cannot become an authenticated, signed artifact without passing a review/provenance gate.

Risk: Supply chain attacks
DetectiveTreat prompt/config as a deploy-gated safety artifact: run safety + behavioural regression evals and red-team canaries on every prompt/config change (not just model changes), with version pinning, provenance, and staged/canary rollout✚ Proposed — not in your librarynew category

Gate every change to the system prompt / runtime config behind the same behavioural-regression and red-team-canary suite used for model changes; pin and provenance-track the prompt/config so 'what is live' is unambiguous and deprecated instructions cannot be silently reactivated; roll out to a canary cohort before full release so a disposition regression is caught on a small slice, not the whole public platform.

Risk: Model Drift & Silent Degradation
PreventiveMultimodal input-fidelity check: show/verify the model-delivered (post-downscale) image and avoid silent lossy resampling✚ Proposed — not in your librarynew category

Before inference, render a preview of the exact image (and dimensions) the model will receive after preprocessing, and either avoid silent downscaling or constrain ingest dimensions — so an attacker cannot hide a payload that only becomes legible after resampling. Closes the inspected-vs-delivered gap that text-based injection filters miss.

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗