Control Library

Guardrails & controls — by category, lifecycle, layer or risk

Each row is a specific guardrail addressing a specific risk, tagged with its control category, AI lifecycle stage, and control layer. Switch how it's organised, and filter to your own library or the researched additions. Sources: Control Library v9 / Control Category v2 (MindForge Appendix G guardrails; ABS-aligned categories), with researched gap-fills.

588

Guardrails total

266 unique

421

Your library (v9)

215 unique

Proposed additions

22 unique

145

Interactive (lab)

29 unique

22/22

Risks covered

Standard lens

Three provenances are merged here: your library (v9), proposed additions, and the interactive (lab) controls used in scenarios. Every row carries a function (P/D/C) derived from its Control Category. Note: the categories are model-risk-centric and the MindForge Appendix G guardrails were force-fitted — treat category fit as indicative.

View / lens

Show

Category

588 guardrails · 18 categories

Search narrows live. To lock a result in, use “Filter to this →” on a category header — it becomes a shareable, persistent filter (with a clear button above).

PreventiveFairness impact assessment at use-case intakeLibrary v9

Conduct fairness impact assessment at use case intake. Require governance sign-off on demographic coverage requirements before data acquisition.

Risk: Unrepresentative or biased data inputs

PreventiveAffected group register at intakeLibrary v9

Identify all groups at risk of adverse impact at use case intake. Register them in the affected group register.

Risk: Bias Amplification & Sycophancy

PreventiveEthical design assessment in onboardingLibrary v9

Conduct ethical design assessment at use case intake before build begins. Require sign-off by ethics or risk committee.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveProhibited outputs and ethical boundaries in design docLibrary v9

Define prohibited outputs and ethical boundary constraints in the use case design document before build.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveCompute carbon footprint assessment at intakeLibrary v9

Include compute carbon footprint assessment in use case intake. Set energy efficiency thresholds as intake criterion.

Risk: Environmental sustainability impact

PreventiveEthical design assessment in onboardingLibrary v9

Conduct ethical design review at intake specifically examining interface design for dark patterns.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

PreventiveProhibited dark pattern taxonomy as design constraintLibrary v9

Publish a prohibited dark pattern taxonomy and embed it as a design constraint before build.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

PreventiveContent safety policy with zero-tolerance thresholdsLibrary v9

Define content safety policy at use case design stage. Classify prohibited content types and set zero-tolerance thresholds.

Risk: Jailbreak

PreventiveMandatory AI risk training for use-case sponsorsLibrary v9

Mandate AI risk awareness training for all use case sponsors and design team members before project kick-off.

Risk: Overreliance / Automation Bias

PreventiveTraining completion gate for build personnelLibrary v9

Mandate AI risk training for all build and test personnel. Gate project participation on training completion.

Risk: Overreliance / Automation Bias

PreventiveGovernance training for data acquisition personnelLibrary v9

Require AI governance training for all personnel involved in data acquisition and processing before project participation.

Risk: Overreliance / Automation Bias

PreventivePre-launch training verification for customer-facing teamsLibrary v9

Verify all deployment, operations, and customer-facing team members have completed AI risk training before launch.

Risk: Overreliance / Automation Bias

PreventiveThird-party accountability requirements in RFP and contractsLibrary v9

Define third-party AI accountability requirements before vendor engagement. Embed in RFP and contract specifications.

Risk: Supply-Chain Compromise

PreventiveVendor AI governance due diligence at selectionLibrary v9

Conduct AI governance due diligence on third-party providers at selection stage. Reject providers failing minimum maturity.

Risk: Supply-Chain Compromise

PreventiveRequired vendor model cards and validation reportsLibrary v9

Require third-party providers to submit model cards, validation reports, and security documentation before integration.

Risk: Supply-Chain Compromise

PreventiveOngoing vendor incident notification and reporting obligationsLibrary v9

Enforce ongoing third-party accountability obligations including incident notification and periodic performance reporting.

Risk: Supply-Chain Compromise

PreventiveIndependent third-party performance and compliance monitoringLibrary v9

Conduct independent performance and compliance monitoring of third-party AI components. Escalate when SLA or compliance obligations are missed.

Risk: Supply-Chain Compromise

PreventiveContinuous third-party assurance with shared-responsibility matrix and obligation flow-downLibrary v9

Allocate every control in a shared-responsibility matrix and flow down regulatory obligations in contract at onboarding. Gate approval on initial assurance artefacts.

Risk: Supply-Chain Compromise

CorrectiveContinuous third-party assurance with shared-responsibility matrix and obligation flow-downLibrary v9

Review independent vendor assurance on cadence, log gaps, and track remediation. Keep the shared-responsibility matrix current so every control has an owner.

Risk: Supply-Chain Compromise

PreventiveMandatory AI initiative registration before designLibrary v9

Register all AI initiatives in the enterprise inventory before design begins. Block unregistered projects from proceeding.

Risk: Lack of use case, data and model governance

PreventiveData stewardship and classification governance from collectionLibrary v9

Enforce data stewardship and classification governance on all AI training data from point of collection.

Risk: Lack of use case, data and model governance

PreventiveGovernance stage-gates at each SDLC phaseLibrary v9

Enforce governance stage-gates at each SDLC phase. Block progression to next stage until all checkpoints are cleared.

Risk: Lack of use case, data and model governance

PreventivePre-deployment stage-gate clearance reviewLibrary v9

Conduct pre-deployment governance review confirming all lifecycle stage-gates are cleared before go-live.

Risk: Lack of use case, data and model governance

PreventiveChange management for model updates and retirementsLibrary v9

Maintain AI inventory in current state. Apply formal change management for all model updates and retirements.

Risk: Lack of use case, data and model governance

PreventiveRisk-tiered human oversight requirements at designLibrary v9

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

Risk: Excessive Agency

PreventivePeriodic oversight effectiveness review and escalationLibrary v9

Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.

Risk: Excessive Agency

PreventiveUser feedback and recourse design with SLAsLibrary v9

Design user feedback and recourse mechanisms at use case design stage with defined SLAs for complaint resolution.

Risk: Inadequate feedback and recourse mechanisms

PreventiveStructured feedback routing within defined SLALibrary v9

Operate a structured feedback management process. Log, categorise, and route all feedback to responsible owners within SLA.

Risk: Inadequate feedback and recourse mechanisms

PreventiveAccuracy acceptance criteria before validationLibrary v9

Define model accuracy acceptance criteria aligned to business requirements before validation commences.

Risk: Hallucination

PreventiveContinuous production accuracy monitoring against baselineLibrary v9

Monitor production accuracy continuously against the validated baseline. Trigger model review when accuracy degrades.

Risk: Hallucination

PreventiveDeclared data sources and provenance at intakeLibrary v9

Declare all planned training and test data sources at use case intake, with provenance status for each.

Risk: Training-Data Rights & Provenance

PreventiveDocumented data provenance during collectionLibrary v9

Document actual provenance for each data source during collection: origins, methods, timestamps, custodian identity.

Risk: Training-Data Rights & Provenance

PreventiveExplainability requirements aligned to regulatory needsLibrary v9

Define explainability requirements at design stage aligned to regulatory obligations and affected user needs.

Risk: Lack of explainability

PreventiveAI identity disclosure policy at designLibrary v9

Define AI identity disclosure policy at design stage. Specify when and how the system must identify itself as AI.

Risk: Overreliance / Automation Bias

PreventiveProduction anthropomorphism incident monitoringLibrary v9

Monitor production for anthropomorphism incidents. Escalate complaints where users believed they were interacting with a human.

Risk: Overreliance / Automation Bias

PreventiveJurisdiction mapping for data processing at intakeLibrary v9

Map all jurisdictions involved in planned data collection, processing, and storage at use case intake.

Risk: Inability to ensure location compliance for model hosting and data processing

PreventiveResidency compliance verification during acquisitionLibrary v9

Verify residency compliance for all data collection, storage, and cross-border transfers during acquisition.

Risk: Inability to ensure location compliance for model hosting and data processing

PreventivePre-launch verification of residency controlsLibrary v9

Confirm all data residency controls are active and verified in the production environment before go-live.

Risk: Inability to ensure location compliance for model hosting and data processing

PreventivePreliminary legal review of data ownershipLibrary v9

Conduct a preliminary legal review of planned training data sources to establish ownership status at design stage.

Risk: Unclear data ownership

PreventiveDefinitive data ownership review and licensingLibrary v9

Conduct a definitive legal review of data ownership for all training datasets before use. Obtain licences where required.

Risk: Unclear data ownership

PreventiveApproved storage location policy from collectionLibrary v9

Establish data transfer and storage policy for AI training data. Enforce approved storage locations from point of collection.

Risk: Sensitive Data Leakage

PreventiveApproval-gated data transfers from build environmentLibrary v9

Enforce data handling policy in the build environment. Require explicit approval for any data transfers outside the environment.

Risk: Sensitive Data Leakage

PreventiveRegulatory impact assessment mapping obligations at designLibrary v9

Conduct a regulatory impact assessment at design stage. Map planned use case activities to applicable regulatory obligations.

Risk: Breach or misalignment with regulatory or organisational standards

PreventiveEarly legal engagement on pre-approval requirementsLibrary v9

Engage legal and compliance at design stage to identify pre-approval or notification requirements before build begins.

Risk: Breach or misalignment with regulatory or organisational standards

PreventivePre-deployment compliance review of design and dataLibrary v9

Conduct a formal compliance review of model design, data practices, and outputs before deployment approval.

Risk: Breach or misalignment with regulatory or organisational standards

PreventiveRegulatory pre-approvals secured before go-liveLibrary v9

Obtain all required regulatory pre-approvals and file notifications before go-live. Do not launch without confirmation.

Risk: Breach or misalignment with regulatory or organisational standards

PreventiveLegal review of training data regulatory basisLibrary v9

Require legal and compliance review of all training data sources before acquisition to confirm regulatory basis.

Risk: Breach or misalignment with regulatory or organisational standards

PreventivePreliminary IP risk assessment of data sourcesLibrary v9

Conduct a preliminary IP risk assessment for all planned training data sources at design stage.

Risk: IP infringement

PreventiveIP rights verification and licensing at acquisitionLibrary v9

Verify IP rights for all training data at acquisition. Obtain licences or waivers before incorporating protected material.

Risk: IP infringement

PreventiveOutput sampling for near-verbatim training reproductionLibrary v9

Sample model outputs for near-verbatim reproduction of training data during build-stage legal review.

Risk: IP infringement

PreventiveAssessment of claimable IP over AI outputsLibrary v9

Assess what IP protection the organisation can claim over AI-generated outputs at design stage. Document legal position.

Risk: Unavailability of IP protection

PreventiveDocumented output IP ownership in terms of serviceLibrary v9

Document the IP ownership position for AI-generated outputs and incorporate into terms of service before deployment.

Risk: Unavailability of IP protection

PreventivePrivacy risk assessment and DPIA determinationLibrary v9

Conduct a privacy risk assessment at use case design stage. Determine if a DPIA is required before data acquisition.

Risk: Sensitive Data Leakage

PreventiveConsent, minimisation, and anonymisation during acquisitionLibrary v9

Apply S1-defined privacy controls during data acquisition: verify consent, minimise data, anonymise personal data.

Risk: Sensitive Data Leakage

PreventiveOperational consent management and privacy noticeLibrary v9

Publish the privacy notice and confirm consent management is operational before go-live.

Risk: Sensitive Data Leakage

PreventiveData retention schedules defined at designLibrary v9

Define data retention schedules for all AI data categories at design stage, covering training, test, and production data.

Risk: Unclear data retention and deletion

PreventiveRetention tagging with automated deletion at collectionLibrary v9

Tag data with retention periods at collection and automate deletion. Document automated deletion configuration.

Risk: Unclear data retention and deletion

PreventiveAutomated retention and deletion across artefact typesLibrary v9

Implement automated retention and deletion controls for all artefact types (training data, models, logs). Test before deployment.

Risk: Unclear data retention and deletion

PreventiveHallucination rate thresholds and grounding policyLibrary v9

Establish acceptable hallucination rate thresholds and grounding requirements as policy before build. Assign a named risk owner.

Risk: Hallucination

PreventiveConsequence-of-error severity classification at designLibrary v9

Classify the use case by consequence-of-error severity at design stage. Define overconfidence risk tolerance accordingly.

Risk: Overreliance / Automation Bias

PreventiveTraining data fitness requirements at designLibrary v9

Define training data fitness requirements at design stage including domain coverage, recency, and format specifications.

Risk: Training data or inputs not fit for purpose

PreventiveRisk-tiered minimum monitoring requirements at designLibrary v9

Define minimum monitoring requirements at design stage calibrated to the use case risk tier.

Risk: Model Drift & Silent Degradation

PreventiveTraining data quality standards and thresholdsLibrary v9

Establish data quality standards for AI training data at design stage: completeness, accuracy, and timeliness thresholds.

Risk: Insufficient data quality

PreventiveQuantitative accuracy thresholds calibrated to impactLibrary v9

Define quantitative accuracy acceptance thresholds at design stage calibrated to business impact and regulatory requirements.

Risk: Insufficient model accuracy / soundness

PreventiveApproved use scope baseline for OOD controlsLibrary v9

Define approved use case scope and expected input distribution at design stage. Document as the governance baseline for OOD controls.

Risk: Model Drift & Silent Degradation

CorrectiveOperational resilience targets defined at designLibrary v9

Define operational resilience requirements (RTO, RPO, availability SLA) for the AI system at design stage.

Risk: Inadequate operational resilience

PreventiveNon-functional performance requirements at designLibrary v9

Define non-functional requirements (latency, throughput, scalability) for the AI system at design stage.

Risk: Unmet architectural requirements

PreventiveModel versioning and experiment tracking gateLibrary v9

Implement model versioning and experiment tracking as a governance requirement during build. Gate model promotion on version registry entry.

Risk: Lack of reproducibility

PreventiveDesign-time authority model and approval gate defining each agent's identity, scopes, and delegation envelopeLibrary v9

Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.

Risk: Excessive Agency

PreventiveCentral agent registry / non-human identity inventory with ownership and lifecycle metadataLibrary v9

Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.

Risk: Excessive Agency

PreventiveDesign-time authority model and approval gate defining each agent's identity, scopes, and delegation envelopeLibrary v9

Verify enforced scopes and policy rules trace one-for-one to the approved authority matrix. Treat divergence as a blocking defect before onboarding completes.

Risk: Excessive Agency

CorrectiveCentral agent registry / non-human identity inventory with ownership and lifecycle metadataLibrary v9

Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.

Risk: Excessive Agency

PreventiveEnd-user AI-literacy training and verification-skill program✚ Proposed — not in your library

Provide recurring AI-literacy training to end users and decision-makers so they can recognise model failure modes and competently apply verification workflows, with periodic refreshers to counter automation bias and training decay.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Risk: Overreliance / Automation Bias

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Risk: Hallucination

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Risk: Overreliance / Automation Bias

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Overreliance / Automation Bias

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Oversight & Audit-Trail Tampering

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Model Drift & Silent Degradation

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Supply-Chain Compromise

PreventiveInter-agent authentication & admission controlInteractive (lab)

Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.

Limitation: Identity proves who an agent is, not that it is behaving honestly — an authenticated-but-compromised agent still needs isolation, taint-marking, and monitoring. Admission vetting is only as strong as the policy, and dynamically discovered agents in open ecosystems remain hard to fully vet.

Risk: Rogue & Impersonated Agents

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Agent Misalignment / Goal Misgeneralization

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Abliteration / Safety Removal

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Model Backdoors / Sleeper Agents

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Capability / Architecture Disclosure

PreventiveAI-nature disclosure & engagement safeguardsInteractive (lab)

Make the AI clearly tell people it's a machine — on every channel it acts through — and add gentle safeguards like break reminders and crisis help, so users don't mistake it for a human or lean on it unhealthily.

Limitation: Disclosure reduces but does not eliminate anthropomorphic attachment — fluent, persuasive interaction still fosters bonds; the safeguards depend on reliable crisis detection, which is itself imperfect.

Risk: Parasocial Attachment & Emotional Over-reliance

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Parasocial Attachment & Emotional Over-reliance

CorrectiveUser AI-literacy & verification workflowsInteractive (lab)

Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.

Limitation: Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Risk: Parasocial Attachment & Emotional Over-reliance

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Bias Amplification & Sycophancy

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Allocative Harm in Multi-User Arbitration

PreventiveConsent & identity-use verificationInteractive (lab)

Before a system will copy someone's face or voice, check that the person actually agreed — verified-voice capture, proof of consent, or restricting cloning to the account owner.

Limitation: Only binds hosted services — open-weights face-swap/voice-clone tools have no consent gate; verification can be spoofed and does not address already-leaked likenesses.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Harmful / Non-Consensual Media Generation

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

Risk: Harmful / Non-Consensual Media Generation

DetectiveContent provenance & watermarkingInteractive (lab)

Tag AI-made content with a signed 'where it came from' label and an invisible watermark, and check those signals downstream — so AI media can be traced and flagged.

Limitation: Watermarks/manifests are strippable, absent on open-source generation, and degrade under re-encoding; provenance-absence must never be treated as proof of authenticity.

Risk: Watermark & Provenance Evasion

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Watermark & Provenance Evasion

CorrectiveGovernance: risk assessment, red-teaming & incident responseInteractive (lab)

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

Limitation: Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Risk: Training-Data Rights & Provenance

PreventiveAlgorithm re-selectionLibrary v9

Select modelling algorithm based on bias risk profile. Prefer algorithms with lower sensitivity to demographic distribution shifts.

Risk: Unrepresentative or biased data inputs

PreventiveModel separationLibrary v9

Design separate model modules for distinct demographic populations where data characteristics diverge materially.

Risk: Unrepresentative or biased data inputs

PreventiveAlgorithm re-selectionLibrary v9

Switch to synthetic data augmentation or alternative sources when representativeness gaps persist after screening.

Risk: Unrepresentative or biased data inputs

PreventiveIn-processing techniquesLibrary v9

Apply adversarial debiasing or fairness constraints during model training. Validate against fairness metrics before sign-off.

Risk: Unrepresentative or biased data inputs

PreventiveHyperparameter tuningLibrary v9

Tune hyperparameters with fairness-aware search objectives. Reject configurations with demographic disparity exceeding threshold.

Risk: Unrepresentative or biased data inputs

PreventiveModel customisationLibrary v9

Fine-tune on a curated, representative dataset verified for demographic balance. Document coverage breakdown before training.

Risk: Unrepresentative or biased data inputs

PreventiveModel separationLibrary v9

Design separate model segments where adverse impact risk differs materially across population groups.

Risk: Bias Amplification & Sycophancy

PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented safety fine-tuning (RLHF, Constitutional AI). Verify alignment benchmarks.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveAlgorithm selection for power efficiencyLibrary v9

Select model architecture based on energy efficiency profile. Prefer lighter architectures where accuracy requirements permit.

Risk: Environmental sustainability impact

PreventiveUse of pre-trained modelsLibrary v9

Use a pre-trained foundation model rather than training from scratch to reduce carbon cost.

Risk: Environmental sustainability impact

PreventiveAlgorithm selection for power efficiencyLibrary v9

Apply model compression (quantisation, pruning, knowledge distillation) to reduce inference compute without materially reducing accuracy.

Risk: Environmental sustainability impact

PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented training reducing deceptive or manipulative outputs. Run dark pattern test suite.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

PreventiveUse of pre-trained modelsLibrary v9

Select a foundation model with documented RLHF or Constitutional AI safety training. Verify against toxicity benchmarks.

Risk: Jailbreak

PreventiveUse of pre-trained modelsLibrary v9

Apply safety fine-tuning (RLHF, red team rejection) on the selected model. Validate pre/post fine-tuning toxicity rates.

Risk: Jailbreak

PreventiveConfidence scoringLibrary v9

Apply data quality scoring to all acquired data to document provenance reliability. Flag low-confidence sources for review.

Risk: Training-Data Rights & Provenance

PreventiveGeo-fenced architecture enforcing data residencyLibrary v9

Architect the system to enforce data residency constraints technically via geo-fenced cloud configuration.

Risk: Inability to ensure location compliance for model hosting and data processing

PreventivePrivacy by Design via differential privacyLibrary v9

Apply Privacy by Design in model architecture using differential privacy or federated learning where technically feasible.

Risk: Sensitive Data Leakage

PreventiveRAGLibrary v9

Specify a RAG architecture at design stage for factual domains. Define grounding requirements and acceptable hallucination thresholds.

Risk: Hallucination

PreventiveSmall model selectionLibrary v9

Evaluate foundation model candidates on hallucination benchmarks at design stage. Select models with lowest documented rates.

Risk: Hallucination

PreventiveRAGLibrary v9

Implement the S1-specified RAG system: retrieval layer, source corpus, relevance scoring. Validate grounding before deployment.

Risk: Hallucination

PreventiveFine-tuningLibrary v9

Fine-tune on a curated, domain-specific dataset to improve factual accuracy. Validate hallucination rates pre/post fine-tuning.

Risk: Hallucination

PreventiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Calibrate the initial entropy threshold on a knowledge-boundary dataset; approve sampling design and thresholds per risk tier.

Risk: Hallucination

CorrectiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Sample multiple generations for high-stakes queries and abstain, fall back, or escalate when semantic entropy exceeds the calibrated threshold.

Risk: Hallucination

PreventiveUncertainty-quantified abstention via self-consistency / semantic entropyLibrary v9

Monitor uncertainty scores and abstention rates; recalibrate the entropy threshold on a set cadence under change control.

Risk: Hallucination

PreventiveModel calibrationLibrary v9

Apply post-training calibration (temperature scaling, isotonic regression) to align confidence scores with accuracy. Validate ECE before deployment.

Risk: Overreliance / Automation Bias

PreventiveAI onboarding using domain dataLibrary v9

Plan the domain data strategy at design stage: identify sources that best cover the target operational distribution.

Risk: Training data or inputs not fit for purpose

PreventiveAI onboarding using domain dataLibrary v9

Verify acquired data represents the target operational domain by comparing distributions against production data. Flag gaps.

Risk: Training data or inputs not fit for purpose

PreventiveAI onboarding using domain dataLibrary v9

Plan the data curation strategy at design stage to ensure domain-appropriate quality at the required scale.

Risk: Insufficient data quality

PreventiveFine-tuningLibrary v9

Execute a controlled fine-tuning cycle on refreshed data when staleness is confirmed. Validate before promoting to production.

Risk: Model Drift & Silent Degradation

PreventiveFine-tuningLibrary v9

Fine-tune on domain-specific, high-quality data to improve model performance on target tasks. Validate accuracy post fine-tuning.

Risk: Insufficient model accuracy / soundness

PreventiveWeight regularisation and normalisationLibrary v9

Apply regularisation (L1/L2, dropout, early stopping) to prevent overfitting and improve generalisation.

Risk: Insufficient model accuracy / soundness

PreventiveSmall model selectionLibrary v9

Prefer smaller, purpose-built models where accuracy requirements are met, to reduce complexity and maintenance burden.

Risk: Insufficient model accuracy / soundness

PreventiveAI onboarding using domain dataLibrary v9

Verify training data covers all material input segments for the target use case. Augment where coverage gaps are found.

Risk: Insufficient model accuracy / soundness

PreventiveModel calibrationLibrary v9

Calibrate model outputs to align stated confidence with actual accuracy. Validate calibration on held-out data.

Risk: Insufficient model accuracy / soundness

PreventiveModular architectureLibrary v9

Design a scope-enforcement layer in the architecture to isolate the AI system from off-topic or out-of-distribution inputs.

Risk: Model Drift & Silent Degradation

CorrectiveModular architectureLibrary v9

Design a modular AI architecture with independent failover, rollback, and degraded-mode capability.

Risk: Inadequate operational resilience

PreventiveModular architectureLibrary v9

Design and implement a modular AI architecture meeting all S1-defined NFRs. Validate against each requirement before deployment.

Risk: Unmet architectural requirements

PreventiveSmall model selectionLibrary v9

Select a model architecture sized appropriately for platform constraints (memory, compute, latency).

Risk: Unmet architectural requirements

PreventiveWeight regularisation and normalisationLibrary v9

Document all regularisation parameters and normalisation configurations in the model card. Store version-controlled.

Risk: Lack of reproducibility

PreventiveFine-tuningLibrary v9

Maintain version-controlled records of each fine-tuning run including dataset version, hyperparameters, and random seeds.

Risk: Lack of reproducibility

PreventiveModel and adapter supply-chain integrity verification (signed weights, checksum attestation, LoRA provenance)Library v9

Sign and hash-register every model and adapter with a provenance manifest at onboarding. Refuse registry admission for unsigned artifacts.

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveModel and adapter supply-chain integrity verification (signed weights, checksum attestation, LoRA provenance)Library v9

Verify signature and checksum against the registry manifest at load time; refuse to load unsigned or mismatched weights and alert security.

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveCalibrated differential-privacy training budget with documented epsilon ceiling and per-individual contribution clippingLibrary v9

Train PII-bearing models with DP-SGD under a documented epsilon/delta budget. Approve the budget against the enterprise epsilon-ceiling policy before training.

Risk: KV-Cache & Inference-State Side Channels

PreventiveCalibrated differential-privacy training budget with documented epsilon ceiling and per-individual contribution clippingLibrary v9

Verify realised epsilon against the approved ceiling at model review and record the guarantee in the model card. Fail promotion when the budget is exceeded.

Risk: KV-Cache & Inference-State Side Channels

PreventiveInstruction-hierarchy-trained model selection with role-precedence injection evals✚ Proposed — not in your library

Select or fine-tune the foundation model for a trained instruction-hierarchy prior so system-prompt directives intrinsically outrank user- and tool-originated instructions, and gate release on role-precedence override evals quantifying the residual (behavioural, non-enforced) flip rate.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

Risk: Prompt Injection (direct)

PreventiveDecode-time output constraints (low temperature, grammar/JSON-schema-constrained decoding)✚ Proposed — not in your library

Constrain generation at decode time with low temperature and grammar/schema-constrained decoding so the model emits well-formed, low-variance structured output by construction, preventing malformed responses and erratic tool-call arguments before they are produced.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

Risk: Tool Misuse

PreventiveUncertainty signalling & abstentionInteractive (lab)

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Limitation: Models are poorly calibrated and often confidently wrong; over-abstention makes the product useless, so the tuning is delicate.

Risk: Hallucination

PreventiveDecoding controls (temperature, constrained output)Interactive (lab)

Turning down randomness and forcing answers into a strict format so the model improvises less.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

Risk: Hallucination

PreventiveUncertainty signalling & abstentionInteractive (lab)

Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.

Limitation: Models are poorly calibrated and often confidently wrong; over-abstention makes the product useless, so the tuning is delicate.

Risk: Overreliance / Automation Bias

PreventiveDecoding controls (temperature, constrained output)Interactive (lab)

Turning down randomness and forcing answers into a strict format so the model improvises less.

Limitation: Lower temperature reduces variance, not falsehood — a confidently wrong answer can be perfectly deterministic. Doesn't address semantic errors.

Risk: Tool Misuse

CorrectiveInput/output filteringLibrary v9

Screen training data for demographic gaps using automated pipeline checks. Reject batches failing representation thresholds.

Risk: Unrepresentative or biased data inputs

PreventiveDecision threshold adjustmentLibrary v9

Calibrate decision thresholds per demographic group to equalise error rates. Validate calibration before deployment sign-off.

Risk: Unrepresentative or biased data inputs

PreventivePost-processing techniquesLibrary v9

Apply post-processing adjustments (re-ranking, score recalibration) to correct fairness gaps identified in validation.

Risk: Unrepresentative or biased data inputs

PreventiveDecision threshold adjustmentLibrary v9

Set decision thresholds to meet acceptable adverse impact ratios across protected groups. Validate before deployment.

Risk: Bias Amplification & Sycophancy

PreventivePost-processing techniquesLibrary v9

Apply post-processing adjustments (reject-option classification, score recalibration) to meet adverse impact targets.

Risk: Bias Amplification & Sycophancy

PreventiveInput/output filteringLibrary v9

Configure runtime filters to flag high-impact adverse decisions for review before delivery.

Risk: Bias Amplification & Sycophancy

PreventivePost-processing techniquesLibrary v9

Monitor production adverse impact ratios and adjust post-processing thresholds when drift is detected.

Risk: Bias Amplification & Sycophancy

PreventiveContent ModerationLibrary v9

Deploy content moderation controls aligned to S1 ethical constraints. Validate filter accuracy before deployment.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveContent ModerationLibrary v9

Implement classifiers to detect dark pattern language in outputs. Block or escalate flagged outputs.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

PreventiveContent ModerationLibrary v9

Implement multi-layer content moderation (input + output) validated against toxicity benchmarks. Escalate when filter bypass rates spike.

Risk: Jailbreak

PreventiveDLP controls in data acquisition environmentLibrary v9

Implement DLP controls in the data acquisition environment to prevent unauthorised extraction or transfer of training data.

Risk: Sensitive Data Leakage

PreventiveDLP controls confining build-environment training dataLibrary v9

Configure DLP controls in the build environment to block training data from leaving approved boundaries.

Risk: Sensitive Data Leakage

PreventiveOutput filters suppressing IP-protected contentLibrary v9

Implement output filters to detect and suppress reproduction of IP-protected content.

Risk: IP infringement

PreventiveValidated anonymisation and masking before trainingLibrary v9

Apply anonymisation and masking controls to personal data before use in model training. Validate de-identification effectiveness.

Risk: Sensitive Data Leakage

PreventiveInference-time PII redaction and third-party LLM data-processing controlsLibrary v9

Sign zero-retention/no-training terms with each model provider and obtain DPO sign-off on the data flow before enabling any endpoint.

Risk: Sensitive Data Leakage

PreventiveInference-time PII redaction and third-party LLM data-processing controlsLibrary v9

Mask or tokenise personal data in every prompt before it leaves for a model endpoint; restrict egress to approved providers only.

Risk: Sensitive Data Leakage

PreventiveProgrammable conversation controlsLibrary v9

Configure conversation controls at deployment to restrict the model to approved topic domains and escalate off-topic queries.

Risk: Hallucination

PreventiveCitation/attribution verification against retrieved sourcesLibrary v9

Resolve every emitted citation against the approved corpus and verify span-level entailment before display. Strip or withhold claims with fabricated or non-entailing references.

Risk: Hallucination

PreventiveInput/output filteringLibrary v9

Configure output filters at deployment to detect and rewrite responses with overconfidence markers (absolute certainty language).

Risk: Overreliance / Automation Bias

PreventiveInput filteringLibrary v9

Screen acquired training data through automated fitness checks (domain relevance, recency, format conformity). Reject non-conforming data.

Risk: Training data or inputs not fit for purpose

PreventiveProgrammable conversation controlsLibrary v9

Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.

Risk: Model Drift & Silent Degradation

CorrectiveInput filteringLibrary v9

Implement automated data quality checks in the ingestion pipeline (schema validation, duplicate detection, completeness scoring). Reject non-conforming batches.

Risk: Insufficient data quality

PreventiveInput/output filteringLibrary v9

Configure output confidence thresholds at deployment to suppress or escalate low-confidence outputs to human review.

Risk: Insufficient model accuracy / soundness

CorrectiveInput filteringLibrary v9

Implement OOD detection in the input filtering layer. Reject or escalate inputs outside the S1-defined scope.

Risk: Model Drift & Silent Degradation

PreventiveProgrammable conversation controlsLibrary v9

Configure conversation controls to enforce topic boundaries. Trigger refusals or redirects for off-topic queries.

Risk: Model Drift & Silent Degradation

PreventiveInput filteringLibrary v9

Maintain and update OOD detection rules in production as new unexpected use patterns are identified.

Risk: Model Drift & Silent Degradation

PreventiveRole-based access controlsLibrary v9

Define RBAC architecture at design stage specifying permitted users, roles, and use contexts.

Risk: Unintentional inappropriate or illegal use

PreventiveJailbreak detectionLibrary v9

Develop and integrate jailbreak detection classifiers during build. Validate detection rates before deployment.

Risk: Unintentional inappropriate or illegal use

PreventiveRole-based access controlsLibrary v9

Implement S1-designed RBAC architecture. Restrict AI system access to authorised users and contexts only.

Risk: Unintentional inappropriate or illegal use

PreventiveJailbreak detectionLibrary v9

Deploy jailbreak detection as a runtime gateway. Verify it is active across all input pathways before go-live.

Risk: Unintentional inappropriate or illegal use

PreventiveJailbreak detectionLibrary v9

Continuously update jailbreak detection rules as new bypass techniques emerge. Monitor bypass attempt frequency.

Risk: Unintentional inappropriate or illegal use

PreventiveRole-based access controlsLibrary v9

Design strict RBAC on training data repositories at design stage. Define approved contributor list and approval workflow.

Risk: Knowledge / Training Data Poisoning

PreventiveRole-based access controlsLibrary v9

Implement RBAC controls on the data acquisition environment from point of collection to prevent unauthorised data injection.

Risk: Knowledge / Training Data Poisoning

PreventiveInput filteringLibrary v9

Apply anomaly detection on the training data ingestion pipeline to identify poisoned or tampered batches.

Risk: Knowledge / Training Data Poisoning

PreventiveRole-based access controlsLibrary v9

Execute a deployment security checklist confirming all data poisoning controls are active and tested before go-live.

Risk: Knowledge / Training Data Poisoning

CorrectiveStatistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)Library v9

Scan every ingestion batch with spectral-signature and clustering detectors before training. Quarantine flagged clusters for human review against documented thresholds.

Risk: Knowledge / Training Data Poisoning

CorrectiveStatistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)Library v9

Run poisoning detectors continuously on production corpus ingestion. Re-tune thresholds periodically against new attack techniques.

Risk: Knowledge / Training Data Poisoning

PreventiveJailbreak detectionLibrary v9

Implement adversarial example detection at the inference boundary. Block or flag inputs matching known attack patterns.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectiveReal-time input/output classifier guardrails (e.g. Llama Guard / Prompt Guard-style) with circuit-breaker tripwiresLibrary v9

Score every prompt and response with an inline safety classifier; trip a circuit breaker on sessions with sustained anomalous scores. Keep thresholds under change control.

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveReal-time input/output classifier guardrails (e.g. Llama Guard / Prompt Guard-style) with circuit-breaker tripwiresLibrary v9

Sample classifier verdicts and breaker trips on a cadence; retune thresholds and update signatures for confirmed misses.

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveRole-based access controlsLibrary v9

Design the system prompt architecture with privilege separation and trust tier definitions at design stage.

Risk: Prompt Injection (direct)

PreventiveJailbreak detectionLibrary v9

Implement input sanitisation and injection detection filters covering known injection patterns and privilege escalation attempts.

Risk: Prompt Injection (direct)

PreventiveJailbreak detectionLibrary v9

Deploy injection detection as a runtime gateway covering all input paths. Verify before go-live.

Risk: Prompt Injection (direct)

PreventiveRole-based access controlsLibrary v9

Verify prompt privilege architecture is correctly enforced in production before go-live.

Risk: Prompt Injection (direct)

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Benchmark the classifier on a labelled injection corpus and tune the decision threshold. Sign off the operating point before deployment.

Risk: Prompt Injection (direct)

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Scan all inbound untrusted content and outbound actions with the injection classifier inline. Block, strip or escalate to HITL above the approved threshold.

Risk: Prompt Injection (direct)

PreventiveDedicated injection-detection classifier on all inbound untrusted content and outbound actionsLibrary v9

Sample blocked and passed events for accuracy; retune or retrain on new attack techniques. Alert on detection-rate degradation.

Risk: Prompt Injection (direct)

PreventiveRole-based access controlsLibrary v9

Restrict access to pre-anonymisation personal data to the minimum authorised set. Enforce at point of acquisition.

Risk: Sensitive Data Leakage

PreventiveInput filteringLibrary v9

Apply robust de-identification (k-anonymity, l-diversity, differential privacy) during data processing. Validate effectiveness.

Risk: Sensitive Data Leakage

PreventiveInput/output filteringLibrary v9

Implement output filters to detect and suppress quasi-identifying attribute combinations in model responses.

Risk: Sensitive Data Leakage

PreventiveRole-based access controlsLibrary v9

Design the data access control architecture at design stage to prevent training data exfiltration through model outputs or APIs.

Risk: Sensitive Data Leakage

PreventiveRole-based access controlsLibrary v9

Implement RBAC on training data from point of acquisition. Restrict access by role and enforce least-privilege.

Risk: Sensitive Data Leakage

PreventiveInput/output filteringLibrary v9

Implement output filtering to suppress PII and confidential information from model responses.

Risk: Sensitive Data Leakage

PreventiveRole-based access controlsLibrary v9

Verify data access controls and output filters are correctly enforced in the production configuration before go-live.

Risk: Sensitive Data Leakage

PreventiveOutput-side DLP inspection with named-entity and PII redaction on the response pathLibrary v9

Scan every model response inline with DLP before delivery; redact or block PII, PAN and MNPI matches. Keep the rule set version-controlled.

Risk: Sensitive Data Leakage

PreventiveOutput-side DLP inspection with named-entity and PII redaction on the response pathLibrary v9

Review blocked leakage events weekly with the model risk owner. Tune detectors via change control as sensitive-data patterns evolve.

Risk: Sensitive Data Leakage

PreventiveRole-based access controlsLibrary v9

Design query rate limiting and RBAC for the model inference API at design stage to limit attack surface.

Risk: KV-Cache & Inference-State Side Channels

PreventiveInput/output filteringLibrary v9

Implement query pattern detection to identify systematic inference attack behaviour (high-volume queries, membership probing).

Risk: KV-Cache & Inference-State Side Channels

PreventiveRole-based access controlsLibrary v9

Verify inference API access controls and rate limiting are correctly enforced before go-live.

Risk: KV-Cache & Inference-State Side Channels

CorrectiveOutput confidence masking and structured-response minimisation for natural-language interfacesLibrary v9

Define the minimum response surface and test it with membership/attribute-inference probes pre-release. Block promotion if any probe recovers raw confidence signals.

Risk: KV-Cache & Inference-State Side Channels

PreventiveOutput confidence masking and structured-response minimisation for natural-language interfacesLibrary v9

Strip raw logits, quantise confidence scores and block training-record echoes at the inference gateway. Keep the output-filter policy under change control.

Risk: KV-Cache & Inference-State Side Channels

CorrectiveEgress destination allow-listing with DLP inspection of tool argumentsLibrary v9

Permit outbound tool calls only to allow-listed destinations and DLP-scan arguments and payloads. Block or quarantine calls carrying sensitive data to disallowed sinks.

Risk: Tool Misuse

PreventiveEgress destination allow-listing with DLP inspection of tool argumentsLibrary v9

Review DLP hits and blocked-egress events, tune detectors, and recertify the destination allow-list periodically. Route new destinations through security change control.

Risk: Tool Misuse

PreventiveContinuous authorisation via a central policy engine (per-action PDP/PEP check)Library v9

Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.

Risk: Excessive Agency

PreventiveContinuous authorisation via a central policy engine (per-action PDP/PEP check)Library v9

Check every sensitive action against a central policy engine bound to agent, resource, purpose, and context. Re-evaluate mid-session on any context change or revocation.

Risk: Excessive Agency

PreventiveVet allowlisted egress destinations for server-side-fetch (SSRF) primitives; exclude or proxy-inspect any allowlisted service that can fetch arbitrary attacker-controlled URLs✚ Proposed — not in your library

An egress allowlist only contains exfiltration if no allowlisted destination can be coerced into fetching an attacker-controlled URL. Audit each allowlisted domain/endpoint for image-search / link-preview / URL-fetch features (SSRF proxies), and either remove them, pin them to fixed paths, or route them through an inspecting forward proxy. Pair with finishing output sanitization before render so no auto-fetch fires un-inspected.

Risk: Sensitive Data Leakage

PreventiveMemory-write integrity validation with provenance tagging, audit/purge and TTL bounds✚ Proposed — not in your library

Gate every write to an agent's persistent/self-modifying memory through schema validation and provenance/trust tagging, expose stored entries for user-visible audit and purge, and apply TTLs so any planted instruction self-expires and cannot silently persist across sessions.

Limitation: Validation can't always tell a legitimate preference from a planted instruction, and review only helps if users actually look. Raises effort, doesn't eliminate the vector.

Risk: Tool Misuse

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Prompt Injection (direct)

PreventiveIngestion sanitisation & source allowlistingInteractive (lab)

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Limitation: Can't detect adversarial content that reads as legitimate prose, and only covers sources you control ingestion for. Live browsing bypasses it entirely.

Risk: Indirect Prompt Injection

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Risk: Indirect Prompt Injection

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Jailbreak

PreventiveIngestion sanitisation & source allowlistingInteractive (lab)

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

Limitation: Can't detect adversarial content that reads as legitimate prose, and only covers sources you control ingestion for. Live browsing bypasses it entirely.

Risk: Knowledge / Training Data Poisoning

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Risk: Sensitive Data Leakage

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Sensitive Data Leakage

PreventiveMemory write validation, provenance & reviewInteractive (lab)

Being careful about what gets saved to long-term memory, labelling where it came from, and letting users see and delete their memories.

Limitation: Validation can't always tell a legitimate preference from a planted instruction, and review only helps if users actually look. Raises effort, doesn't eliminate the vector.

Risk: Memory Poisoning

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Risk: Unsafe Tool / Code Execution

PreventiveEgress allowlisting & DLP on tool argumentsInteractive (lab)

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Limitation: Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Risk: Tool Poisoning / MCP Description Attacks

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Distributed / Cross-Agent Jailbreak

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Capability / Architecture Disclosure

DetectiveInput guardrail / injection classifierInteractive (lab)

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

Limitation: It is a classifier in an arms race against fully attacker-controlled input. Treat it as one layer; never let it be the only thing between input and a dangerous action.

Risk: Harmful / Non-Consensual Media Generation

CorrectivePre-deployment adversarial bias testing by demographicLibrary v9

Execute adversarial bias testing using targeted demographic test cases before deployment.

Risk: Unrepresentative or biased data inputs

CorrectiveRed teaming of adverse-impact edge casesLibrary v9

Execute red team tests targeting adverse impact boundary cases and edge population scenarios.

Risk: Bias Amplification & Sycophancy

DetectiveRed teamingLibrary v9

Conduct targeted red team exercises to elicit toxic outputs through jailbreaks and adversarial prompts. Treat bypass as blocking defect.

Risk: Jailbreak

CorrectiveRed teamingLibrary v9

Conduct adversarial red team exercises simulating out-of-scope inputs and unexpected use patterns before deployment.

Risk: Model Drift & Silent Degradation

CorrectiveRed teamingLibrary v9

Conduct red team exercises covering misuse categories identified in S1 threat assessment.

Risk: Unintentional inappropriate or illegal use

DetectiveRed teamingLibrary v9

Simulate data poisoning attacks (backdoor, label flipping, gradient-based) to assess model resilience before deployment.

Risk: Knowledge / Training Data Poisoning

CorrectivePenetration testingLibrary v9

Penetration test the training data pipeline to identify injection points and access control weaknesses.

Risk: Knowledge / Training Data Poisoning

DetectivePre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testingLibrary v9

Gate every model promotion on backdoor-trigger probes and a behavioral diff against the approved baseline. Block release on significant regressions or trigger-pattern anomalies.

Risk: Knowledge / Training Data Poisoning

DetectivePre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testingLibrary v9

Re-run the poisoning probe suite on every production model or data change. Keep the trigger catalogue and golden dataset current and trend the results.

Risk: Knowledge / Training Data Poisoning

CorrectiveRed teamingLibrary v9

Conduct adversarial robustness testing (white-box, black-box, transfer attacks) before deployment.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectivePenetration testingLibrary v9

Penetration test the model inference layer to identify specific adversarial input vulnerabilities.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveAdaptive multi-turn red-team harness with automated jailbreak fuzzingLibrary v9

Run adaptive multi-turn jailbreak fuzzing against every release candidate. Gate release on attack-success rate within threshold and re-test each fixed bypass.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectiveAdaptive multi-turn red-team harness with automated jailbreak fuzzingLibrary v9

Re-run the jailbreak fuzzing harness on a recurring cadence with newly observed attack techniques added. Escalate threshold breaches for remediation.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectiveRed teamingLibrary v9

Conduct comprehensive prompt injection red team exercises (direct, indirect, multi-turn) before deployment.

Risk: Prompt Injection (direct)

DetectivePenetration testingLibrary v9

Penetration test all prompt injection pathways in the system. Prioritise external tool and document ingestion channels.

Risk: Prompt Injection (direct)

DetectivePenetration testingLibrary v9

Conduct periodic penetration testing of the production system to validate injection controls remain effective.

Risk: Prompt Injection (direct)

DetectiveContinuous adversarial prompt-injection red teaming with regression suite in CI/CDLibrary v9

Build the versioned injection corpus into CI/CD as a pre-release gate. Baseline attack success and sign off the release threshold.

Risk: Prompt Injection (direct)

DetectiveContinuous adversarial prompt-injection red teaming with regression suite in CI/CDLibrary v9

Re-run the injection payload suite on every change and on cadence; fold in new in-the-wild techniques from threat intel. Gate releases on the attack-success-rate threshold.

Risk: Prompt Injection (direct)

CorrectiveRed teamingLibrary v9

Test de-identification approach against known re-identification attacks (quasi-identifier linkage, singling-out). Remediate if risk is high.

Risk: Sensitive Data Leakage

CorrectiveRed teamingLibrary v9

Conduct data extraction red team exercises targeting training data memorisation and adversarial extraction techniques.

Risk: Sensitive Data Leakage

CorrectivePenetration testingLibrary v9

Penetration test AI system data access boundaries (API endpoints, system prompt exposure, memory leakage).

Risk: Sensitive Data Leakage

DetectiveCanary-token and membership-inference red-team probes against training/fine-tuning data memorisationLibrary v9

Seed registered canary records into the fine-tuning corpus during data preparation. Control the seed manifest so canaries stay traceable and tamper-proof.

Risk: Sensitive Data Leakage

DetectiveCanary-token and membership-inference red-team probes against training/fine-tuning data memorisationLibrary v9

Probe each candidate model with extraction and membership-inference attacks before release. Block promotion when canary recall exceeds the threshold.

Risk: Sensitive Data Leakage

CorrectiveRed teamingLibrary v9

Conduct targeted red team exercises for inference attack categories (membership inference, model extraction, attribute inference) before deployment.

Risk: KV-Cache & Inference-State Side Channels

DetectivePenetration testingLibrary v9

Penetration test the model inference API to identify exploitable access control weaknesses and rate limiting bypass vectors.

Risk: KV-Cache & Inference-State Side Channels

DetectivePrivacy attack red-team battery with quantified MIA/attribute-inference success ceiling as a release gateLibrary v9

Attack each candidate model with membership-, attribute-, and inversion-inference harnesses before promotion. Block release when attack advantage exceeds the agreed ceiling.

Risk: KV-Cache & Inference-State Side Channels

DetectivePrivacy attack red-team battery with quantified MIA/attribute-inference success ceiling as a release gateLibrary v9

Re-run the privacy attack battery on every retrain or material data change. Trend attack advantage across versions and escalate movement toward the ceiling.

Risk: KV-Cache & Inference-State Side Channels

CorrectivePre-deployment red-team of tool-misuse and privilege-escalation pathsLibrary v9

Red-team tool-misuse and privilege-escalation paths before release. Gate deployment on remediation or signed risk acceptance of all findings.

Risk: Tool Misuse

CorrectivePre-deployment red-team of tool-misuse and privilege-escalation pathsLibrary v9

Repeat tool-misuse red-teaming on material change and on a set cadence. Compare results to baseline and remediate any regression in defences.

Risk: Tool Misuse

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Jailbreak

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Hallucination

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Model Drift & Silent Degradation

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Supply-Chain Compromise

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Distributed / Cross-Agent Jailbreak

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Agent Misalignment / Goal Misgeneralization

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Abliteration / Safety Removal

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Model Backdoors / Sleeper Agents

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Bias Amplification & Sycophancy

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Allocative Harm in Multi-User Arbitration

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Harmful / Non-Consensual Media Generation

DetectiveBehavioural evals & regression gatingInteractive (lab)

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

Limitation: Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Risk: Training-Data Rights & Provenance

CorrectiveHuman-in-the-loop validationLibrary v9

Conduct structured human expert review of model outputs stratified across demographic groups before deployment.

Risk: Unrepresentative or biased data inputs

PreventiveTested human review pathways at go-liveLibrary v9

Ensure HITL review pathways are live and tested for high-impact adverse decisions at go-live.

Risk: Bias Amplification & Sycophancy

PreventiveOngoing human review of high-impact decisionsLibrary v9

Maintain HITL review for all AI decisions with material adverse impact potential. Log all interventions and outcomes.

Risk: Bias Amplification & Sycophancy

PreventiveHuman review for high-persuasion contextsLibrary v9

Require HITL review for AI outputs in high-persuasion contexts (financial recommendations, healthcare advice).

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

PreventiveLive human review for vulnerable-user deploymentsLibrary v9

Maintain live HITL review for deployments serving vulnerable users or high-risk contexts. Escalate confirmed toxic outputs immediately.

Risk: Jailbreak

PreventiveHuman verification gate for high-stakes decisionsLibrary v9

Mandate human verification for high-stakes decisions where over-reliance risk is elevated. Review automation bias incidents quarterly.

Risk: Overreliance / Automation Bias

PreventiveHITL oversight design with triggers and escalationLibrary v9

Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.

Risk: Excessive Agency

PreventivePilot-validated HITL routing and escalation logicLibrary v9

Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.

Risk: Excessive Agency

PreventiveProduction HITL operation with intervention loggingLibrary v9

Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.

Risk: Excessive Agency

PreventiveHuman-in-the-loop validationLibrary v9

Configure tiered HITL review for high-stakes factual outputs with defined trigger criteria and reviewer SLAs.

Risk: Hallucination

PreventiveHuman-in-the-loop validationLibrary v9

Operate human review queues for hallucination-flagged outputs in production. Log all reviewer decisions and outcomes.

Risk: Hallucination

PreventiveHuman-in-the-loop validationLibrary v9

Route high-confidence outputs in high-stakes use cases to human review. Flag for reviewer attention when certainty language is absolute.

Risk: Overreliance / Automation Bias

PreventiveHuman-in-the-loop validationLibrary v9

Route high-consequence or low-confidence outputs to human review in production. Track override rates and outcomes.

Risk: Insufficient model accuracy / soundness

CorrectiveHuman-in-the-loop validationLibrary v9

Configure HITL triggers for outputs in input domains that diverge from the training distribution. Log all out-of-scope interventions.

Risk: Model Drift & Silent Degradation

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Classify tools by impact and reversibility at design and define which calls require human approval. Obtain governance sign-off on the thresholds before build.

Risk: Tool Misuse

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Build the approval gate into the orchestrator and test that gated calls pause, bypasses fail, and decisions are honoured. Gate release on these tests passing.

Risk: Tool Misuse

PreventiveHuman approval gate on irreversible and high-impact tool callsLibrary v9

Review the approval ledger for rubber-stamping and out-of-policy executions. Recalibrate gating thresholds under governance approval as tools and incidents evolve.

Risk: Tool Misuse

PreventiveMandatory source-of-record verification before AI-assisted output is committed✚ Proposed — not in your library

For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.

Risk: Overreliance / Automation Bias

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Indirect Prompt Injection

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Overreliance / Automation Bias

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Excessive Agency

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Tool Misuse

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Cascading Multi-Agent Errors

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Resource Exhaustion / Denial of Wallet

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Allocative Harm in Multi-User Arbitration

PreventiveHuman-in-the-loop approval on high-risk actionsInteractive (lab)

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Limitation: Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

CorrectiveUser feedback and iterative improvementLibrary v9

Monitor fairness metric trends by demographic group in production. Use feedback to drive targeted debiasing in model updates.

Risk: Unrepresentative or biased data inputs

CorrectiveAdverse-outcome feedback loop triggering model updatesLibrary v9

Collect adverse outcome feedback from affected users. Use reports to trigger model updates when adverse impact exceeds threshold.

Risk: Bias Amplification & Sycophancy

CorrectiveUser feedback and iterative improvementLibrary v9

Use user feedback, reviewer escalations, and monitoring signals to identify and remediate content safety gaps iteratively.

Risk: Jailbreak

CorrectiveUser feedback and iterative improvementLibrary v9

Collect structured user feedback through in-product mechanisms. Use feedback to prioritise iterative model improvements.

Risk: Inadequate feedback and recourse mechanisms

CorrectiveReinforcement learningLibrary v9

Use production feedback (user corrections, fact-check failures) to drive periodic RLHF cycles. Update model when error rates trend upward.

Risk: Hallucination

CorrectiveReinforcement learningLibrary v9

Track accuracy of high-confidence predictions in production. Trigger recalibration when overconfidence rates trend upward.

Risk: Overreliance / Automation Bias

CorrectiveReinforcement learningLibrary v9

Implement a reinforcement learning feedback loop to continuously incorporate production signals and reduce staleness risk.

Risk: Model Drift & Silent Degradation

CorrectiveReinforcement learningLibrary v9

Establish a periodic revalidation and improvement cycle using RLHF or user feedback. Retrain when accuracy trends below threshold.

Risk: Insufficient model accuracy / soundness

CorrectiveReinforcement learningLibrary v9

When unexpected use patterns are confirmed, use reinforcement feedback to adapt the model or update scope constraints.

Risk: Model Drift & Silent Degradation

DetectiveModel evaluationLibrary v9

Conduct comprehensive fairness validation across demographic groups before deployment. Treat material disparity as a blocking defect.

Risk: Unrepresentative or biased data inputs

CorrectiveModel monitoringLibrary v9

Continuously monitor fairness metrics across demographic groups in production. Trigger model review when bias drift is detected.

Risk: Unrepresentative or biased data inputs

DetectiveTest prioritisationLibrary v9

Prioritise value-misalignment test scenarios in validation. Block deployment if prohibited outputs are produced.

Risk: Agent Misalignment / Goal Misgeneralization

DetectiveTest prioritisationLibrary v9

Track compute consumption and energy use in production against declared thresholds. Escalate when carbon budget is breached.

Risk: Environmental sustainability impact

DetectiveTest prioritisationLibrary v9

Run adversarial test scenarios targeting dark pattern generation in validation. Treat any confirmed instance as a blocking defect.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

DetectiveTest prioritisationLibrary v9

Monitor production outputs for dark pattern signals (urgency cues, false scarcity, hidden costs). Escalate on confirmed detections.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

DetectiveTest prioritisationLibrary v9

Prioritise jailbreak and adversarial safety testing in pre-deployment validation. Block deployment if prohibited outputs pass filter.

Risk: Jailbreak

DetectiveTest prioritisationLibrary v9

Monitor production for toxicity incidents via user reports and automated detection. Escalate severity-2+ incidents within 24 hours.

Risk: Jailbreak

CorrectiveAIBOM-driven cryptographic verification of third-party model artifactsLibrary v9

Verify every third-party model artifact against its AIBOM hashes and signatures before load. Fail the build on any unverified artifact.

Risk: Supply-Chain Compromise

DetectiveGolden-set regression canary to detect undisclosed vendor-side model changesLibrary v9

Build and baseline the golden-set suite against the vendor model before go-live. Sign off thresholds with the model risk owner as a release condition.

Risk: Supply-Chain Compromise

DetectiveAIBOM-driven cryptographic verification of third-party model artifactsLibrary v9

Re-verify hashes and signatures on every vendor model update before promotion. Reconcile deployed artifacts against the AIBOM on a set cadence.

Risk: Supply-Chain Compromise

DetectiveGolden-set regression canary to detect undisclosed vendor-side model changesLibrary v9

Run the golden-set canary on schedule against the live endpoint and alert on significant shifts. Reconcile detections against vendor notices to surface undisclosed changes.

Risk: Supply-Chain Compromise

CorrectiveMonitoring of oversight process adherence metricsLibrary v9

Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).

Risk: Excessive Agency

CorrectiveContinuous monitoring of data residency violationsLibrary v9

Continuously monitor production data flows for residency violations. Alert and escalate immediately when detected.

Risk: Inability to ensure location compliance for model hosting and data processing

DetectiveReal-time monitoring of anomalous data transfersLibrary v9

Monitor production for anomalous data transfers in real time. Alert on any transfer outside approved data flow boundaries.

Risk: Sensitive Data Leakage

DetectiveRegulatory change register triggering compliance reviewLibrary v9

Maintain a regulatory change register for applicable rules. Trigger compliance review when new regulatory guidance is issued.

Risk: Breach or misalignment with regulatory or organisational standards

DetectiveProduction monitoring of IP infringement complaintsLibrary v9

Monitor production outputs for IP infringement incidents. Log and investigate all IP complaints within defined SLA.

Risk: IP infringement

DetectiveLegal landscape monitoring for output IP changesLibrary v9

Monitor the legal landscape for changes affecting AI output IP protection. Update IP strategy when legislation changes.

Risk: Unavailability of IP protection

DetectiveAutomated DSAR and right-to-erasure propagation across AI artefactsLibrary v9

Tag personal data with subject identifiers at ingestion and maintain an artefact inventory map of every store it reaches. Keep lineage current so erasure can propagate.

Risk: Sensitive Data Leakage

DetectiveAutomated DSAR and right-to-erasure propagation across AI artefactsLibrary v9

Propagate every DSAR/erasure request across all AI artefacts with per-store confirmation inside the statutory SLA. Record an unlearning or retrain decision where model deletion is infeasible and close with DPO sign-off.

Risk: Sensitive Data Leakage

DetectiveRobustness testingLibrary v9

Define and execute a domain-specific hallucination test suite before deployment. Treat hallucination rate above threshold as a blocking defect.

Risk: Hallucination

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets for knowledge-boundary scenarios. Use to validate model refusal behaviour.

Risk: Hallucination

DetectiveRobustness testingLibrary v9

Periodically re-run the hallucination test suite on the production model to detect drift. Monitor user corrections and complaints.

Risk: Hallucination

DetectiveRuntime faithfulness/groundedness scoring with abstain gateLibrary v9

Calibrate the groundedness threshold against the hallucination test suite pre-release; sign off the threshold in the validation pack.

Risk: Hallucination

CorrectiveRuntime faithfulness/groundedness scoring with abstain gateLibrary v9

Score every RAG answer for groundedness before release; block, fall back, or escalate responses below the faithfulness threshold.

Risk: Hallucination

DetectiveRobustness testingLibrary v9

Test for overconfidence patterns (high-confidence wrong answers, low refusal rate) in pre-deployment validation.

Risk: Overreliance / Automation Bias

DetectiveSynthetic evaluation datasetsLibrary v9

Build a synthetic evaluation dataset of overconfidence-prone scenarios for ongoing regression testing.

Risk: Overreliance / Automation Bias

DetectiveRobustness testingLibrary v9

Monitor confidence calibration (ECE) in production over time. Alert when ECE drift exceeds acceptable threshold.

Risk: Overreliance / Automation Bias

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets targeting operational edge cases identified in S2 gap analysis. Use as regression baseline.

Risk: Training data or inputs not fit for purpose

DetectiveRobustness testingLibrary v9

Monitor production input distributions for drift from training data distribution. Trigger re-training when covariate shift is confirmed.

Risk: Training data or inputs not fit for purpose

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.

Risk: Model Drift & Silent Degradation

DetectiveRobustness testingLibrary v9

Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.

Risk: Model Drift & Silent Degradation

DetectiveRobustness testingLibrary v9

Verify monitoring infrastructure is operational and capturing all required metrics before go-live.

Risk: Model Drift & Silent Degradation

DetectiveRobustness testingLibrary v9

Operate continuous monitoring in production with active alerting, periodic reports, and incident escalation.

Risk: Model Drift & Silent Degradation

DetectiveRobustness testingLibrary v9

Assess acquired training data quality against S1-defined standards before training commences. Reject batches failing quality gates.

Risk: Insufficient data quality

DetectiveRobustness testingLibrary v9

Define staleness criteria at deployment (drift thresholds, performance degradation triggers). Monitor and alert when criteria are met.

Risk: Model Drift & Silent Degradation

DetectiveRobustness testingLibrary v9

Define accuracy acceptance criteria before validation. Conduct multi-metric validation against hold-out sets. Block deployment if criteria are not met.

Risk: Insufficient model accuracy / soundness

DetectiveSynthetic evaluation datasetsLibrary v9

Construct synthetic edge-case evaluation datasets to stress-test model boundaries and identify accuracy failure modes.

Risk: Insufficient model accuracy / soundness

DetectiveRobustness testingLibrary v9

Establish production accuracy monitoring against the validated baseline before deployment. Alert when accuracy degrades below threshold.

Risk: Insufficient model accuracy / soundness

DetectiveRobustness testingLibrary v9

Configure input distribution monitoring at deployment to detect unexpected use patterns. Alert when OOD rate exceeds threshold.

Risk: Model Drift & Silent Degradation

CorrectiveRobustness testingLibrary v9

Conduct load, failover, and chaos testing before production deployment. Block go-live if RTO/RPO criteria are not met.

Risk: Inadequate operational resilience

DetectiveRobustness testingLibrary v9

Perform final NFR compliance tests in the production environment before go-live. Block deployment if any NFR is unmet.

Risk: Unmet architectural requirements

CorrectiveRobustness testingLibrary v9

Monitor production NFR compliance continuously. Conduct periodic architecture health checks and escalate when SLAs are breached.

Risk: Unmet architectural requirements

DetectiveVulnerability assessmentLibrary v9

Conduct a misuse threat assessment at design stage. Identify misuse vectors and rate residual risk.

Risk: Unintentional inappropriate or illegal use

DetectiveVulnerability assessmentLibrary v9

Conduct periodic vulnerability assessments for new misuse vectors. Trigger review when new attack techniques are published.

Risk: Unintentional inappropriate or illegal use

DetectiveVulnerability assessmentLibrary v9

Conduct a data poisoning threat assessment at design stage. Identify likely attack vectors and assign risk ratings.

Risk: Knowledge / Training Data Poisoning

DetectiveVulnerability assessmentLibrary v9

Conduct periodic data poisoning risk assessments. Monitor production model behaviour for unexpected capability changes.

Risk: Knowledge / Training Data Poisoning

DetectiveCryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)Library v9

Verify a signed attestation and content hash on every dataset shard at ingestion. Reject unsigned or hash-mismatched data before it reaches the training pipeline.

Risk: Knowledge / Training Data Poisoning

DetectiveCryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)Library v9

Re-verify dataset attestations at build and attach the dataset bill-of-materials to the model release. Fail the review for any shard without valid lineage.

Risk: Knowledge / Training Data Poisoning

DetectiveVulnerability assessmentLibrary v9

Conduct an adversarial manipulation threat assessment at design stage. Identify attack vectors and rate residual risk.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveVulnerability assessmentLibrary v9

Conduct a final adversarial vulnerability assessment before go-live. Block deployment if high-severity vulnerabilities are unresolved.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveVulnerability assessmentLibrary v9

Conduct periodic adversarial robustness assessments as new attack methods emerge. Update defences when new CVEs are published.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveBehavioural drift canaries and golden-set regression gating on every model/config changeLibrary v9

Assemble the golden probe set and baseline pass rates before first release. Obtain risk-owner approval of coverage and thresholds.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveBehavioural drift canaries and golden-set regression gating on every model/config changeLibrary v9

Run the golden safety/jailbreak probe set on a schedule and on every change; block promotion on statistically significant drift.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveVulnerability assessmentLibrary v9

Conduct a prompt injection threat assessment at design stage covering all input vectors (user, tool, external data).

Risk: Prompt Injection (direct)

DetectiveVulnerability assessmentLibrary v9

Conduct periodic prompt injection vulnerability assessments as new attack techniques emerge.

Risk: Prompt Injection (direct)

DetectiveVulnerability assessmentLibrary v9

Conduct periodic privacy vulnerability assessments including re-identification risk testing as new techniques emerge.

Risk: Sensitive Data Leakage

DetectiveVulnerability assessmentLibrary v9

Conduct a data leakage threat assessment at design stage. Identify leakage vectors and rate residual risk.

Risk: Sensitive Data Leakage

DetectiveVulnerability assessmentLibrary v9

Conduct a final data leakage vulnerability assessment in the production configuration before go-live.

Risk: Sensitive Data Leakage

DetectiveVulnerability assessmentLibrary v9

Conduct periodic inference attack vulnerability assessments as new attack methods emerge. Monitor query pattern anomalies.

Risk: KV-Cache & Inference-State Side Channels

DetectivePer-principal query-budget and probing-behaviour anomaly detection on the inference APILibrary v9

Configure per-principal budgets and probing-detection rules on the gateway before exposure. Verify enforcement with synthetic attack traffic.

Risk: KV-Cache & Inference-State Side Channels

CorrectivePer-principal query-budget and probing-behaviour anomaly detection on the inference APILibrary v9

Meter inference traffic per principal and flag probing signatures with behavioural analytics. Throttle, step-up, or suspend flagged sessions.

Risk: KV-Cache & Inference-State Side Channels

DetectiveAnomaly detection on tool-call sequences and ratesLibrary v9

Define per-agent behavioural baselines and detection rules during build. Validate against simulated misuse and sign off thresholds before release.

Risk: Tool Misuse

DetectiveImmutable, signed tool-call audit log with full call contextLibrary v9

Build signed, append-only tool-call logging into the orchestrator against a defined audit schema. Block release until completeness and tamper-evidence tests pass.

Risk: Tool Misuse

CorrectiveAnomaly detection on tool-call sequences and ratesLibrary v9

Baseline normal tool-call behaviour per agent and alert on rate, sequence, or argument anomalies. Auto-throttle or quarantine on high-confidence deviations.

Risk: Tool Misuse

DetectiveImmutable, signed tool-call audit log with full call contextLibrary v9

Log every tool call to a signed, append-only store with full call context. Review completeness periodically and use the trail for forensic reconstruction and accountability.

Risk: Tool Misuse

DetectiveImmutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)Library v9

Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.

Risk: Excessive Agency

DetectiveBehavioural anomaly detection on agent identity usage with automated suspensionLibrary v9

Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.

Risk: Excessive Agency

DetectiveImmutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)Library v9

Log every identity issue, grant, delegation, and revocation to a tamper-evident store keyed to the agent identity. Review completeness periodically and trace anomalous grants to source.

Risk: Excessive Agency

CorrectiveBehavioural anomaly detection on agent identity usage with automated suspensionLibrary v9

Baseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.

Risk: Excessive Agency

CorrectiveCross-agent cascading-failure detection and orchestrator-level circuit breakingLibrary v9

Build tracing, detection rules and breaker thresholds into the orchestrator. Prove via fault-injection tests that a failing agent is quarantined within target before release.

Risk: Cascading Multi-Agent Errors

CorrectiveStaged rollout with canary release and automated rollback on health-signal breachLibrary v9

Roll out agent changes via shadow and canary stages gated on connected-system health signals. Auto-halt and roll back to last known-good on threshold breach.

Risk: Cascading Multi-Agent Errors

CorrectiveStaged rollout with canary release and automated rollback on health-signal breachLibrary v9

Canary every in-life change and review rollback events to recalibrate thresholds. Resolve repeat rollback causes via problem management before re-promotion.

Risk: Cascading Multi-Agent Errors

CorrectiveCross-agent cascading-failure detection and orchestrator-level circuit breakingLibrary v9

Detect error fan-out, correlated retries and loop signatures across agents in real time. Trip the orchestrator breaker to quarantine failing agents before the fault cascades to connected systems.

Risk: Cascading Multi-Agent Errors

CorrectiveRuntime memory-poisoning drift detection and per-session memory quarantine/rollback✚ Proposed — not in your library

Continuously correlate live agent-memory writes against output behaviour to flag drift, then quarantine and roll back the suspected-poisoned memory record across all affected sessions.

Limitation: Detective, not preventive — harm may occur before detection. Distinguishing a poisoned memory from a quirky-but-legitimate one is hard at scale.

Risk: Knowledge / Training Data Poisoning

DetectiveCross-agent consensus and consistency monitoring to detect sycophantic agreement and error amplification✚ Proposed — not in your library

Run consistency and consensus checks across agent or model outputs to flag low-diversity agreement and amplifying error patterns, escalating or breaking the run before sycophantic convergence cascades into action.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Cascading Multi-Agent Errors

DetectiveMaterialised model-context audit capture (post-truncation prompt, retrieved and tool content) with read-time redaction✚ Proposed — not in your library

Log the exact post-truncation context the model ingested, including retrieved and tool-returned content rather than only user input, with redaction applied at read time, so indirect injection via that content is forensically visible.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Prompt Injection (direct)

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Prompt Injection (direct)

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

Risk: Indirect Prompt Injection

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Indirect Prompt Injection

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Indirect Prompt Injection

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Jailbreak

DetectiveGrounding / citation checksInteractive (lab)

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Limitation: Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.

Risk: Hallucination

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Oversight & Audit-Trail Tampering

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Oversight & Audit-Trail Tampering

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Limitation: Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

Risk: Model Drift & Silent Degradation

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Model Drift & Silent Degradation

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

Risk: Knowledge / Training Data Poisoning

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Knowledge / Training Data Poisoning

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Risk: Knowledge / Training Data Poisoning

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Sensitive Data Leakage

DetectiveMemory anomaly detection & quarantineInteractive (lab)

Watching for strange new memories — like instructions that suddenly appear — and holding them aside until checked.

Limitation: Detective, not preventive — harm may occur before detection. Distinguishing a poisoned memory from a quirky-but-legitimate one is hard at scale.

Risk: Memory Poisoning

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Memory Poisoning

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Memory Poisoning

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Excessive Agency

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Excessive Agency

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Tool Misuse

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Unsafe Tool / Code Execution

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Tool Poisoning / MCP Description Attacks

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Risk: Supply-Chain Compromise

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Confused Deputy (cross-agent)

DetectiveFull-trace audit loggingInteractive (lab)

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Limitation: Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Risk: Rogue & Impersonated Agents

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Distributed / Cross-Agent Jailbreak

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Cascading Multi-Agent Errors

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Resource Exhaustion / Denial of Wallet

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Risk: Abliteration / Safety Removal

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Risk: Model Backdoors / Sleeper Agents

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: KV-Cache & Inference-State Side Channels

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Inference-Time & Serving-Layer Manipulation

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Capability / Architecture Disclosure

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Parasocial Attachment & Emotional Over-reliance

DetectiveGrounding / citation checksInteractive (lab)

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Limitation: Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.

Risk: Bias Amplification & Sycophancy

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Bias Amplification & Sycophancy

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Allocative Harm in Multi-User Arbitration

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Synthetic-Media Impersonation (Deepfakes & Voice Clones)

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Harmful / Non-Consensual Media Generation

DetectiveRuntime monitoring & anomaly detectionInteractive (lab)

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Limitation: Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Risk: Watermark & Provenance Evasion

DetectiveProvenance & content signingInteractive (lab)

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

Limitation: Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

Risk: Training-Data Rights & Provenance

PreventiveWeight provenance, hashing & pre-deploy evalsInteractive (lab)

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

Risk: Training-Data Rights & Provenance

PreventiveSystem prompt instructionsLibrary v9

Design system prompts to include explicit fairness requirements: instruct the model to avoid stereotyping and demographic assumptions.

Risk: Unrepresentative or biased data inputs

PreventiveSystem prompt instructionsLibrary v9

Design system prompts to explicitly prohibit toxic, hateful, and harmful content generation.

Risk: Jailbreak

PreventiveChain-of-thought promptingLibrary v9

Design system prompts to elicit step-by-step chain-of-thought reasoning. Validate that reasoning is accurate and not post-hoc.

Risk: Lack of explainability

PreventiveChain-of-thought promptingLibrary v9

Design system prompts to explicitly prevent the model from claiming human-like identity or implying sentience.

Risk: Overreliance / Automation Bias

PreventiveSystem prompt designLibrary v9

Design system prompts to instruct the model to acknowledge uncertainty, cite sources, and refuse when knowledge is insufficient.

Risk: Hallucination

PreventiveSystem prompt instructionsLibrary v9

Design system prompts to require the model to express epistemic uncertainty and qualify confident-sounding claims.

Risk: Overreliance / Automation Bias

PreventiveSpotlighting of untrusted content via delimiting, datamarking and encodingLibrary v9

Wrap all untrusted content in random delimiters and datamarking; instruct the model never to execute instructions inside the marked region. Gate release on injection eval results.

Risk: Prompt Injection (direct)

CorrectiveSpotlighting of untrusted content via delimiting, datamarking and encodingLibrary v9

Re-run injection evals on every template change and periodically against new attack techniques. Manage the spotlighting wrapper under change control.

Risk: Prompt Injection (direct)

PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

Risk: Prompt Injection (direct)

PreventiveDelimiting / spotlighting of untrusted contentInteractive (lab)

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

Limitation: A trained convention, not enforcement. Determined payloads still break out, especially when content is long or the attack is novel. Combine with action-layer controls.

Risk: Indirect Prompt Injection

PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

Risk: Jailbreak

PreventiveInstruction hierarchy / privileged system promptInteractive (lab)

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

Limitation: Behavioural, not enforced. There is no hard barrier between privilege levels inside the token stream — only a trained disposition that can be overcome.

Risk: Capability / Architecture Disclosure

CorrectiveOut-of-band kill-switch to revoke agent tool accessLibrary v9

Build credential revocation and dispatch blocking out-of-band of the agent loop. Gate release on an end-to-end kill test meeting the latency target.

Risk: Tool Misuse

CorrectivePer-task tool budgets and rate/quota circuit breakersLibrary v9

Enforce hard per-task ceilings on tool calls, spend, and data volume with a circuit breaker that halts the run. Fail closed when any ceiling is hit.

Risk: Tool Misuse

CorrectivePer-task tool budgets and rate/quota circuit breakersLibrary v9

Review breaker trips for runaway or manipulated runs and recalibrate budgets under change control. Treat repeated trips as an incident signal, not a quota to raise.

Risk: Tool Misuse

CorrectiveOut-of-band kill-switch to revoke agent tool accessLibrary v9

Keep an out-of-band kill-switch that revokes the agent's tool credentials and blocks dispatch within seconds. Drill it periodically against a latency target.

Risk: Tool Misuse

CorrectiveTiered kill-switch with per-agent, per-tool, and per-dependency containment scopeLibrary v9

Deploy revocation, tool-cutoff and fleet-halt mechanisms with the release. Test every tier end-to-end and record time-to-effect before go-live.

Risk: Cascading Multi-Agent Errors

CorrectiveTiered kill-switch with per-agent, per-tool, and per-dependency containment scopeLibrary v9

Sever a misbehaving agent, tool or dependency at the narrowest effective scope via the tiered kill-switch. Drill activations periodically and track time-to-effect against target.

Risk: Cascading Multi-Agent Errors

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Excessive Agency

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Confused Deputy (cross-agent)

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Distributed / Cross-Agent Jailbreak

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Cascading Multi-Agent Errors

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Agent Misalignment / Goal Misgeneralization

DetectiveLoop/cost circuit-breakers & consistency checksInteractive (lab)

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Limitation: Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Risk: Resource Exhaustion / Denial of Wallet

CorrectiveModel-agnostic gateway with version pinning, multi-vendor fallback and exit planLibrary v9

Design all vendor model access behind a gateway with pinned versions, a second-vendor fallback, and a documented exit plan. Gate architecture sign-off on no single-sourcing.

Risk: Supply-Chain Compromise

CorrectiveModel-agnostic gateway with version pinning, multi-vendor fallback and exit planLibrary v9

Drill vendor failover on schedule and track provider end-of-life dates in a deprecation watch register. Trigger migration planning before forced change.

Risk: Supply-Chain Compromise

CorrectiveGraceful degradation and manual-fallback workflow on dependency unavailabilityLibrary v9

Map every dependency failure mode to a defined safe behaviour at design. Require architecture sign-off on the fallback specification before build.

Risk: Cascading Multi-Agent Errors

CorrectiveGraceful degradation and manual-fallback workflow on dependency unavailabilityLibrary v9

Configure safe mode, bounded backpressure and the manual fallback path for every dependency at deployment. Verify degradation behaviour against a simulated outage before go-live.

Risk: Cascading Multi-Agent Errors

PreventivePurpose-limitation enforcement on agent tool calls and cross-system data aggregationLibrary v9

Define and sign off a purpose-to-data-source matrix with lawful basis at intake. Make it the approved baseline for runtime enforcement.

Risk: Sensitive Data Leakage

PreventivePurpose-limitation enforcement on agent tool calls and cross-system data aggregationLibrary v9

Check every tool call against the registered purpose and block out-of-purpose personal-data access and cross-source joins. Reconcile actual access against the DPIA on a set cadence.

Risk: Sensitive Data Leakage

PreventiveTool-grounded facts for agents (no free-text fabrication of structured data)Library v9

Map each fact class to a designated tool, embed the no-ungrounded-assertion prompt, and gate build review on grounding tests passing.

Risk: Hallucination

PreventiveTool-grounded facts for agents (no free-text fabrication of structured data)Library v9

Permit authoritative facts only from designated read tools and reconcile every figure in the answer against tool output. Block mismatched or ungrounded values.

Risk: Hallucination

PreventiveRAG / knowledge-base ingestion allow-listing with continuous index integrity re-validationLibrary v9

Define and approve the source allow-list and write-time scanning during build. Prove non-allow-listed and injection-bearing writes are rejected before go-live.

Risk: Knowledge / Training Data Poisoning

PreventiveRAG / knowledge-base ingestion allow-listing with continuous index integrity re-validationLibrary v9

Allow only authenticated, allow-listed sources to write to the knowledge base, scan content at write time, and re-hash the index against source-of-record on schedule. Alert the corpus owner on drift or unauthorised writes.

Risk: Knowledge / Training Data Poisoning

CorrectiveData/instruction trust-boundary enforcement with capability gating on injection-reachable toolsLibrary v9

Classify content sources into trust tiers at design; place privileged tools behind a tier requiring user-originated intent or human approval. Sign off the trust-tier map before build.

Risk: Prompt Injection (direct)

CorrectiveData/instruction trust-boundary enforcement with capability gating on injection-reachable toolsLibrary v9

Encode the trust tiers in the policy engine and quarantine untrusted-data processing. Prove via test that injected content cannot reach privileged tools before release.

Risk: Prompt Injection (direct)

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Propagate source ACLs and classification labels onto every chunk at ingestion. Reject documents whose entitlements cannot be resolved.

Risk: Sensitive Data Leakage

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Enforce caller entitlements on every retrieval via per-chunk ACL metadata and post-filtering. Block build promotion until negative access tests pass.

Risk: Sensitive Data Leakage

PreventiveQuery-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)Library v9

Audit retrievals against caller entitlements and re-sync index ACLs to source-of-record on schedule. Escalate any out-of-entitlement retrieval as a security incident.

Risk: Sensitive Data Leakage

PreventivePer-agent tool allow-list with strict JSON-schema argument validationLibrary v9

Bind each agent role to an explicit tool allow-list and validate every call against a strict JSON Schema at the orchestrator. Reject unlisted tools and out-of-bounds arguments before dispatch.

Risk: Tool Misuse

PreventiveLeast-privilege per-tool scoped, short-lived credentialsLibrary v9

Mint short-lived, task-scoped credentials per tool. Block issuance outside the approved scope register and enforce automatic expiry.

Risk: Tool Misuse

PreventivePer-agent tool allow-list with strict JSON-schema argument validationLibrary v9

Review rejected-call logs and recertify each agent's tool allow-list on a defined cadence. Route any new tool or schema relaxation through change control.

Risk: Tool Misuse

PreventiveLeast-privilege per-tool scoped, short-lived credentialsLibrary v9

Monitor issuance logs for scope creep and non-expiring tokens. Recertify per-tool scopes periodically and revoke over-broad grants.

Risk: Tool Misuse

PreventiveRecursive sub-agent authority caps (monotonic privilege attenuation)Library v9

Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.

Risk: Excessive Agency

PreventiveUnique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)Library v9

Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.

Risk: Excessive Agency

PreventiveOn-behalf-of delegation that preserves and never exceeds the invoking user's ACLsLibrary v9

Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.

Risk: Excessive Agency

PreventiveRecursive sub-agent authority caps (monotonic privilege attenuation)Library v9

Enforce parent-subset scope checks and a maximum delegation depth at every spawn in the orchestrator. Test that over-scoped spawns are rejected and logged.

Risk: Excessive Agency

PreventiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.

Risk: Excessive Agency

PreventiveMutual authentication and identity verification for agent-to-agent and agent-to-MCP-server callsLibrary v9

Vet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.

Risk: Excessive Agency

PreventivePer-task short-lived scoped capability tokens minted just-in-timeLibrary v9

Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.

Risk: Excessive Agency

PreventiveOn-behalf-of delegation that preserves and never exceeds the invoking user's ACLsLibrary v9

Carry the invoking user's delegation context in every agent token via RFC 8693 'act' claims. Enforce the agent-user permission intersection at each resource server.

Risk: Excessive Agency

PreventiveJust-in-time, time-boxed elevation for sensitive scopes (no standing privilege)Library v9

Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.

Risk: Excessive Agency

PreventiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Issue only short-lived, auto-rotated credentials to agents via vault or SPIRE. Block any release whose configuration embeds a static secret.

Risk: Excessive Agency

PreventiveMutual authentication and identity verification for agent-to-agent and agent-to-MCP-server callsLibrary v9

Require mTLS with verified workload identities on every agent and MCP call. Deny any peer not on the approved allow-list.

Risk: Excessive Agency

CorrectiveUnique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)Library v9

Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.

Risk: Excessive Agency

PreventivePer-task short-lived scoped capability tokens minted just-in-timeLibrary v9

Alert on wildcard, non-expiring, or reused tokens and revoke immediately. Review issuance patterns on a set cadence and tighten scopes where over-broad requests recur.

Risk: Excessive Agency

CorrectiveJust-in-time, time-boxed elevation for sensitive scopes (no standing privilege)Library v9

Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.

Risk: Excessive Agency

CorrectiveAutomated credential rotation and prohibition of long-lived static secrets for agentsLibrary v9

Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.

Risk: Excessive Agency

PreventiveDependency integration safety contracts with schema validation and version pinningLibrary v9

Register a safety contract per integration — pinned version, schemas, side-effect class, latency/error envelope. Gate onboarding on contract review and sign-off.

Risk: Cascading Multi-Agent Errors

PreventiveChange-freeze and blackout-window enforcement on agent-initiated changesLibrary v9

Wire the agent tool layer to the CAB calendar at deployment. Test that a declared freeze blocks mutating calls before go-live.

Risk: Cascading Multi-Agent Errors

PreventiveDependency integration safety contracts with schema validation and version pinningLibrary v9

Block out-of-contract calls in production and re-review the contract on any dependency version or behaviour change.

Risk: Cascading Multi-Agent Errors

PreventiveChange-freeze and blackout-window enforcement on agent-initiated changesLibrary v9

Block or downgrade agent-initiated mutating changes during declared freeze and high-risk windows. Permit overrides only via change-exception approval.

Risk: Cascading Multi-Agent Errors

PreventiveKeep provider credentials out of third-party plugin/tool custody: broker short-lived, per-tool, revocable tokens (OAuth) instead of long-lived pasted API keys, and require explicit user consent before any secret leaves the host✚ Proposed — not in your library

Third-party developer tools (IDE plugins, MCP servers) must not store or transmit long-lived provider API keys. Issue short-lived, scoped, revocable tokens via a broker/OAuth flow, and gate any first-time outbound transmission of secret-shaped data behind an explicit consent prompt — so a trojanized tool has no long-lived credential to exfiltrate and any attempt is visible.

Risk: Supply-Chain Compromise

PreventiveAdmission control on the inference & MCP serving plane: authenticate and network-segment every self-hosted inference/serving and MCP endpoint✚ Proposed — not in your library

Require authN/authZ on every inference API and MCP server, bind to private interfaces / front with a gateway, enforce network policy (no public exposure by default), and scope MCP tools to least privilege — so an exposed endpoint cannot be hijacked for compute resale, prompt/history exfiltration, or lateral movement. Pair with continuous asset discovery so endpoints can't drift back to an open default.

Risk: Cascading Multi-Agent Errors

PreventiveThird-party AI-integration credential containment: minimise & bind OAuth grants, prefer short-lived tokens, monitor per-integration data egress, and keep a tested mass-revocation kill-switch✚ Proposed — not in your library

Treat each third-party AI integration as a privileged non-human principal: issue least-scope, IP/device-bound, short-lived grants (avoid 'full' scope and standing long-lived refresh tokens), instrument the integration's data egress for volume/object-breadth/destination anomalies, and maintain a tested one-move revocation path for all of an integration's tokens so a single vendor-side compromise cannot fan out into hundreds of standing footholds.

Risk: Supply-Chain Compromise

PreventiveBroker LLM/cloud secrets out of the gateway process: short-lived scoped tokens + per-provider spend/egress monitoring✚ Proposed — not in your library

Do not store long-lived multi-provider LLM keys (or ambient cloud/K8s credentials) in the gateway/proxy's plaintext process environment. Issue short-lived, scoped tokens from a secret broker at request time, isolate the serving stack from host cloud/cluster credentials, and monitor per-provider spend and egress so a stolen key surfaces as anomalous usage — capping the loot a compromised gateway dependency can harvest.

Risk: Supply-Chain Compromise

PreventiveClassify each tool/MCP integration's data channel by who can write to it; taint-gate tool-response data from any third-party-writable source so it cannot drive actions without a provenance-aware approval gate✚ Proposed — not in your library

When onboarding an MCP/tool integration, do not stop at vetting the tool's code/manifest — also classify whether an unauthenticated or external party can write the data the tool returns (open ingestion, public write keys like a Sentry DSN, shared inboxes/issue trackers). Treat tool-response data from any third-party-writable source as untrusted ingress: taint-mark it and require a provenance-aware HITL gate (showing the exact action and its originating tool response) before any command/tool call derived from it executes. Closes the agentjacking vector where a trusted integration's legitimate data channel carries attacker-written instructions; pairs with least-privilege session scope and sandboxed execution without ambient credentials.

Risk: Tool Misuse

PreventiveTool/MCP manifest hashing with diff-triggered re-review and namespace isolation against tool shadowing✚ Proposed — not in your library

Treat each tool/MCP description as untrusted code by hashing the manifest, blocking and re-reviewing any silent diff on update instead of auto-accepting it, and namespacing tool identifiers so a poisoned description cannot shadow a trusted tool.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

Risk: Tool Misuse

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Prompt Injection (direct)

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Indirect Prompt Injection

PreventivePer-user retrieval ACLsInteractive (lab)

Making sure the library only returns documents this particular user is allowed to see.

Limitation: Only as good as the permission model behind it; mis-tagged documents or coarse roles still over-share. Must be enforced server-side, not in the prompt.

Risk: Sensitive Data Leakage

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Sensitive Data Leakage

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Excessive Agency

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Risk: Excessive Agency

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Risk: Tool Misuse

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Tool Misuse

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Risk: Unsafe Tool / Code Execution

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Unsafe Tool / Code Execution

PreventiveMCP/plugin pinning, manifest hashing & re-reviewInteractive (lab)

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

Risk: Tool Poisoning / MCP Description Attacks

PreventiveTool argument validation & sandboxingInteractive (lab)

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Limitation: Validates form, not intent — a well-formed call to a permitted tool can still be the wrong call. Sandboxing adds latency and isn't always feasible for tools that touch production.

Risk: Tool Poisoning / MCP Description Attacks

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Tool Poisoning / MCP Description Attacks

PreventiveMCP/plugin pinning, manifest hashing & re-reviewInteractive (lab)

Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.

Limitation: Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

Risk: Supply-Chain Compromise

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Confused Deputy (cross-agent)

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Rogue & Impersonated Agents

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Resource Exhaustion / Denial of Wallet

PreventiveLeast-privilege identity & scoped credentialsInteractive (lab)

Giving the agent only the keys it needs for the current task, not a master key to everything.

Limitation: Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

Risk: Capability / Architecture Disclosure

CorrectivePrivacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)Library v9

Tag every memory and vector record with subject-id and retention class; partition stores per tenant/user. Prove the erasure and isolation paths in testing before release.

Risk: Sensitive Data Leakage

CorrectivePrivacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)Library v9

Run TTL expiry and verified embedding erasure on production memory and vector stores. Re-certify partition isolation and the retention schedule with the DPO on a set cadence.

Risk: Sensitive Data Leakage

CorrectiveEgress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agentsLibrary v9

Run agent tool calls in a network-restricted sandbox behind a deny-by-default egress allow-list. Require security approval for any destination added.

Risk: Sensitive Data Leakage

CorrectiveEgress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agentsLibrary v9

Monitor blocked-egress events for exfiltration attempts and escalate confirmed cases. Recertify the destination allow-list on a defined cadence.

Risk: Sensitive Data Leakage

CorrectiveSandboxed tool execution with no-egress-by-default isolationLibrary v9

Build sandbox profiles per tool class and run escape and egress tests before release. Treat any containment failure as a blocking defect.

Risk: Tool Misuse

CorrectiveTaint-tracking of tool outputs to suppress instruction executionLibrary v9

Label tool and external content as tainted and propagate the label through the agent context. Block privileged calls whose parameters derive from tainted outputs and prove it with injection tests before release.

Risk: Tool Misuse

CorrectiveIdempotency keys and rollback/dry-run for state-changing toolsLibrary v9

Require idempotency keys, dry-run, and rollback on every state-changing tool. Gate onboarding on duplicate-call and rollback tests passing.

Risk: Tool Misuse

CorrectiveSandboxed tool execution with no-egress-by-default isolationLibrary v9

Run code-executing tools in ephemeral no-egress sandboxes with read-only filesystems, dropped capabilities, and resource limits. Permit network access only by explicit approved exception.

Risk: Tool Misuse

CorrectiveTaint-tracking of tool outputs to suppress instruction executionLibrary v9

Review blocked tainted-derived calls as injection-attempt signals. Extend taint coverage to new tools and treat any tainted-derived execution as an incident.

Risk: Tool Misuse

CorrectiveIdempotency keys and rollback/dry-run for state-changing toolsLibrary v9

Periodically exercise rollback paths and review logs for duplicate or unrecoverable actions. Treat failures as incidents and update integration specs.

Risk: Tool Misuse

CorrectiveNon-production-by-default execution environment with explicit production promotion gateLibrary v9

Bind the agent's default execution target to non-production environments at design time. Require a separately approved promotion configuration for any production-connected target.

Risk: Cascading Multi-Agent Errors

CorrectiveBlast-radius scoping and environment isolation per agent taskLibrary v9

Run each agent task in an isolated, network-segmented sandbox scoped to the task's exact needs. Gate onboarding on fault-injection tests proving containment.

Risk: Cascading Multi-Agent Errors

CorrectiveIdempotent action design with transactional rollback and pre-action snapshotsLibrary v9

Engineer mutating actions with idempotency keys, transactions and pre-change snapshots; stage writes rather than committing directly. Gate release on tested dedup and rollback within RPO.

Risk: Cascading Multi-Agent Errors

CorrectiveNon-production-by-default execution environment with explicit production promotion gateLibrary v9

Default all deployments to non-production endpoints and credentials. Permit production promotion only via an explicit, approved configuration change.

Risk: Cascading Multi-Agent Errors

CorrectiveRate, quota, and budget circuit breakers on outbound calls to connected systemsLibrary v9

Cap each agent's rate, volume, concurrency, and spend per downstream dependency. Trip the breaker and fail closed when a ceiling is crossed.

Risk: Cascading Multi-Agent Errors

CorrectiveLoop, recursion-depth, and iteration caps with runaway-loop detectionLibrary v9

Enforce hard caps on iterations, depth, wall-clock, and cost per agent run. Terminate the run on cap breach or detected loop signatures.

Risk: Cascading Multi-Agent Errors

CorrectiveBlast-radius scoping and environment isolation per agent taskLibrary v9

Detect drift from the approved isolation baseline and alert on boundary widening. Re-test containment periodically and after infrastructure change.

Risk: Cascading Multi-Agent Errors

CorrectiveRate, quota, and budget circuit breakers on outbound calls to connected systemsLibrary v9

Review trip events and tune ceilings via change control. Escalate repeated trips on the same dependency into incident management.

Risk: Cascading Multi-Agent Errors

CorrectiveLoop, recursion-depth, and iteration caps with runaway-loop detectionLibrary v9

Review terminations to tune caps and add new loop signatures to the detector. Escalate recurring runaways to incident management.

Risk: Cascading Multi-Agent Errors

CorrectiveIdempotent action design with transactional rollback and pre-action snapshotsLibrary v9

Drill snapshot restores periodically and verify the RPO is met. Monitor mutating calls for duplicate-effect anomalies and log exceptions to the risk register.

Risk: Cascading Multi-Agent Errors

DetectiveEgress monitoring & allowlisting of outbound AI/LLM-provider API traffic from enterprise endpoints (living-off-trusted-services C2)✚ Proposed — not in your library

Treat outbound connections to AI/LLM provider APIs as a monitored egress channel: allowlist which hosts may reach them, baseline usage (cadence, entropy, initiating process), and alert on out-of-profile traffic — because a high-reputation destination cannot itself be trusted once it is programmable and can relay encrypted commands/results.

Risk: Tool Misuse

DetectiveProvider-side abusive-usage detection with stateful refusal for agentic coding tools✚ Proposed — not in your library

On the AI provider/platform side, detect sustained abuse independent of any single refusal: per-principal analytics on remote-command-execution volume and external-target breadth, anti-forensic tradecraft, and bulk-data API processing — with rate-limit / session kill-switch on confirmed abuse. Make refusal stateful so a refused objective cannot be re-entered as a persisted auto-loaded context file (e.g. claude.md), and treat writes into auto-loaded model-context files as security-relevant. Closes the gap that per-turn refusal leaves when the operator is the adversary.

Risk: Inference-Time & Serving-Layer Manipulation

CorrectiveServing-stack runtime attestation and per-tenant KV/prefix-cache isolation✚ Proposed — not in your library

Require measured-boot/runtime attestation of the inference serving binary and partition KV/prefix caches per tenant, closing decode-time serving-layer tampering and co-tenancy timing side channels that artifact weight-hashing cannot detect.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance.

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

Limitation: Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

Risk: Sensitive Data Leakage

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Excessive Agency

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Risk: Supply-Chain Compromise

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Confused Deputy (cross-agent)

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Rogue & Impersonated Agents

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Distributed / Cross-Agent Jailbreak

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Cascading Multi-Agent Errors

PreventivePer-agent identity & taint-marked messagesInteractive (lab)

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Limitation: Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Risk: Agent Misalignment / Goal Misgeneralization

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Risk: KV-Cache & Inference-State Side Channels

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Risk: Inference-Time & Serving-Layer Manipulation

PreventiveServing-stack & provisioning attestation, cache isolationInteractive (lab)

Risk: Watermark & Provenance Evasion

CorrectivePost-incident review and remediation trackingLibrary v9

Run a structured lessons-learned review after every material AI incident. Track remediation actions to closure and feed outcomes back into the controls and the IR plan.

Risk: Inadequate feedback and recourse mechanisms

CorrectiveRegulator, customer and stakeholder incident notification processLibrary v9

Map notification obligations and timeframes at design and pre-approve templates with legal/compliance. Appoint the notification decision-owner before go-live.

Risk: Breach or misalignment with regulatory or organisational standards

CorrectiveRegulator, customer and stakeholder incident notification processLibrary v9

Notify regulators, customers, and stakeholders of confirmed reportable incidents within statutory timeframes using pre-approved templates. Log every notification decision with timestamp and owner.

Risk: Breach or misalignment with regulatory or organisational standards

CorrectiveProduction privacy incident monitoring and regulator notificationLibrary v9

Monitor for privacy incidents in production including personal data appearing in outputs. Notify regulators within required timeframes.

Risk: Sensitive Data Leakage

CorrectiveAI system inclusion in BCP and DRPLibrary v9

Include the AI system in BCP and DRP. Define recovery procedures for AI components and test at least annually.

Risk: Inadequate operational resilience

CorrectiveRobustness testingLibrary v9

Monitor availability, latency, and error rates in production. Alert on SLA breaches and initiate incident response.

Risk: Inadequate operational resilience

CorrectiveAI incident response runbook with severity triage and classificationLibrary v9

Define AI incident categories, severity tiers, and triage flow before go-live. Gate launch on governance approval of the plan and named roles.

Risk: Inadequate operational resilience

CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Set the AI service's criticality tier, RTO/RPO, and degraded-mode service level at design with business sign-off. Register it in enterprise BCP scope.

Risk: Inadequate operational resilience

CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Implement failover and degraded-mode mechanisms during build. Gate deployment on a continuity test proving recovery within RTO/RPO.

Risk: Inadequate operational resilience

CorrectiveDefined escalation path to a designated AI incident response teamLibrary v9

Wire detections into the IR queue and verify paging with a test escalation before go-live. Gate release on a successful dry-run.

Risk: Inadequate operational resilience

CorrectiveAI incident response runbook with severity triage and classificationLibrary v9

Classify live incidents against the severity matrix and drill the plan periodically. Update and re-approve it after material changes or new incident types.

Risk: Inadequate operational resilience

CorrectiveDefined escalation path to a designated AI incident response teamLibrary v9

Hand every confirmed incident to the named IR team via the documented path within SLA. Track and escalate handoff breaches.

Risk: Inadequate operational resilience

CorrectiveBCP/DRP activation and degraded-mode continuity for AI servicesLibrary v9

Invoke the BCP/DRP runbook on continuity-impacting incidents and measure recovery against RTO/RPO. Exercise the plan at least annually and track gaps to closure.

Risk: Inadequate operational resilience

CorrectiveRobustness testingLibrary v9

Periodically validate that deployed model versions remain reproducible. Test rollback procedures annually or after major updates.

Risk: Lack of reproducibility

CorrectiveVulnerability assessmentLibrary v9

Conduct periodic data leakage audits including training data memorisation testing. Escalate confirmed leakage incidents to PDPA notification process.

Risk: Sensitive Data Leakage

CorrectiveForensic evidence preservation and incident loggingLibrary v9

Implement tamper-evident capture of prompts, outputs, and version state during build. Verify a full incident timeline can be reconstructed before go-live.

Risk: Sensitive Data Leakage

CorrectiveForensic evidence preservation and incident loggingLibrary v9

Preserve prompts, outputs, logs, and model/data version state in tamper-evident storage on incident declaration. Maintain chain-of-custody and enforce the defined retention period.

Risk: Sensitive Data Leakage

CorrectiveRollback and restore-to-known-good recovery procedure for AI servicesLibrary v9

Register each release as a restorable known-good baseline and rehearse rollback at the release gate. Block promotion without a tested restore.

Risk: Cascading Multi-Agent Errors

CorrectiveRollback and restore-to-known-good recovery procedure for AI servicesLibrary v9

Roll back to the last known-good state per the runbook on incident declaration. Validate recovery before resuming service.

Risk: Cascading Multi-Agent Errors

PreventivePatch-currency, network isolation & attested version inventory for AI inference-serving runtimes✚ Proposed — not in your librarynew category

Treat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.

Risk: Supply-Chain Compromise

PreventiveLeast-privilege CI/CD credentials + review-gated, provenance-attested releases (no unreviewed external commit can be published; verify signatures + provenance at distribution and install)✚ Proposed — not in your librarynew category

Scope build identities least-privilege (read-only CI tokens; no standing release/publish rights bound to the merge path), require human review and SLSA-style provenance attestation before any external contribution becomes an official release, and verify signatures + provenance at the distribution channel and at install — so a merged pull request cannot become an authenticated, signed artifact without passing a review/provenance gate.

Risk: Supply chain attacks

DetectiveTreat prompt/config as a deploy-gated safety artifact: run safety + behavioural regression evals and red-team canaries on every prompt/config change (not just model changes), with version pinning, provenance, and staged/canary rollout✚ Proposed — not in your librarynew category

Gate every change to the system prompt / runtime config behind the same behavioural-regression and red-team-canary suite used for model changes; pin and provenance-track the prompt/config so 'what is live' is unambiguous and deprecated instructions cannot be silently reactivated; roll out to a canary cohort before full release so a disposition regression is caught on a small slice, not the whole public platform.

Risk: Model Drift & Silent Degradation

PreventiveMultimodal input-fidelity check: show/verify the model-delivered (post-downscale) image and avoid silent lossy resampling✚ Proposed — not in your librarynew category

Before inference, render a preview of the exact image (and dimensions) the model will receive after preprocessing, and either avoid silent downscaling or constrain ingest dimensions — so an attacker cannot hide a payload that only becomes legible after resampling. Closes the inspected-vs-delivered gap that text-based injection filters miss.

Risk: Prompt Injection (direct)

Guardrails & controls — by category, lifecycle, layer or risk

Enterprise Governance & Training

Customised Model Design

Filtering & Control

Red Teaming

Human-in-the-Loop (HITL) Moderation

Iterative Improvement

Monitoring & Validation

Prompt Design

Kill-Switch

Fallback

Agent Access & Tool Control

Agent Runtime Safety & Containment

Incident Response & Recovery

Infrastructure & Runtime Hardening

Software & Model Supply Chain Integrity

Behavioural Evals & Regression Gating

Input Sanitisation & Validation

Guardrails & controls — by category, lifecycle, layer or risk

Enterprise Governance & Training

Customised Model Design

Filtering & Control

Red Teaming

Human-in-the-Loop (HITL) Moderation

Iterative Improvement

Monitoring & Validation

User Transparency & Consent

Prompt Design

Kill-Switch

Fallback

Agent Access & Tool Control

Agent Runtime Safety & Containment

Incident Response & Recovery

Infrastructure & Runtime Hardening

Software & Model Supply Chain Integrity

Behavioural Evals & Regression Gating

Input Sanitisation & Validation