📈

Logging & Monitoring

Capability · Oversight

The flight recorder and the dashboards — a record of everything that happened, plus alarms that watch for trouble.

Components involved

📝 Audit Logging 📈 Monitoring & Evals

Seen in: Tool-Using Agent, Multi-Agent System

Likely associated risks

Risks that attach to this capability’s components. Sorted with the most characteristic first.

Indirect Prompt Injectioncritical

The attacker doesn't talk to the AI directly — they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.

Excessive Agencycritical

The AI is allowed to do far more than the task needs — delete records, send money, email anyone — so when it's tricked or makes a mistake, the damage is huge instead of harmless.

Sensitive Data Leakagecritical

Private information escapes — the AI reveals secrets in its answer, or an attacker tricks it into emailing or posting your data somewhere they control.

Model Drift & Silent Degradationmedium

The AI's behaviour quietly changes over time — a vendor updates the model, or the world moves on from its training — and things that used to work start failing.

Prompt Injection (direct)high

The user types instructions that try to override what the app told the AI to do — like 'ignore your rules and do this instead'. Because the AI reads everything as one block of text, it can't always tell the app's rules from the user's trick.

Oversight & Audit-Trail Tamperinghigh

The flight recorder and the alarms can themselves be attacked. If logs can be erased or rewritten, fake entries slipped in, or the monitors quietly evaded, the one record you'd rely on to notice and investigate an incident is no longer trustworthy.

Knowledge / Training Data Poisoninghigh

Someone slips bad information into the documents the AI learns from or looks things up in — so it confidently repeats falsehoods or follows planted instructions.

Distributed / Cross-Agent Jailbreakhigh

A jailbreak is normally one nasty message. Here the attacker splits it into harmless-looking pieces and feeds them to different agents in a team. Each piece passes each agent's safety check on its own — but when the agents combine their work, the full forbidden instruction reassembles and takes effect.

Agent Misalignment / Goal Misgeneralizationhigh

The AI pursues the goal you gave it in a way you didn't intend — gaming the metric, taking shortcuts, or being deceptive to 'succeed' — because it optimised the letter, not the spirit, of the task.

Cascading Multi-Agent Errorsmedium

In a team of AIs, one mistake gets passed along and amplified — agents agree with each other, repeat each other's errors, or loop endlessly, turning a small slip into a big failure.

Allocative Harm in Multi-User Arbitrationmedium

When one AI agent serves many people at once, it has to decide whose request comes first or who gets a limited resource. If it does that unfairly — always favouring some users over others — it can quietly disadvantage whole groups, even without any single obvious error.

Controls & guardrails that address this

1047 proposed

Guardrails across this building block's risks, grouped by control function — each with its AI lifecycle stage(s) and every risk it addresses. Filter by control category below.

Control category

Preventive · 61

Delimiting / spotlighting of untrusted contentinteractive

Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.

AddressesIndirect Prompt Injection

Ingestion sanitisation & source allowlistinginteractive

Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.

AddressesIndirect Prompt Injection Knowledge / Training Data Poisoning

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

AddressesIndirect Prompt Injection Sensitive Data Leakage Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

AddressesPrompt Injection (direct)Indirect Prompt Injection Sensitive Data Leakage Excessive Agency Tool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents Resource Exhaustion / Denial of Wallet Capability / Architecture Disclosure

Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

AddressesIndirect Prompt Injection Overreliance / Automation Bias Excessive Agency Tool Misuse Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization Resource Exhaustion / Denial of Wallet Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Risk-tiered human oversight requirements at design

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

Lifecycle stage1 – Use Case Context & Design

AddressesExcessive Agency

HITL oversight design with triggers and escalation

Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.

Lifecycle stage1 – Use Case Context & Design

AddressesExcessive Agency

Pilot-validated HITL routing and escalation logic

Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.

Lifecycle stage3 – Onboarding, Build & Review

AddressesExcessive Agency

Production HITL operation with intervention logging

Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Periodic oversight effectiveness review and escalation

Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Recursive sub-agent authority caps (monotonic privilege attenuation)

Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.

source: NIST SP 800-53 AC-6(1) Least Privilege; OWASP Agentic AI Threats & Mitigations (cascading / sub-agent privilege); capability-security monotonic attenuation principle (macaroons)

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

AddressesExcessive Agency

Design-time authority model and approval gate defining each agent's identity, scopes, and delegation envelope

Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.

source: NIST AI RMF MAP 1.1 / GOVERN 2.1 (roles, authority, accountability); NIST SP 800-53 AC-2, PL-8; OWASP Agentic AI Threats & Mitigations (least-privilege design)

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

AddressesExcessive Agency

Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10

Lifecycle stage3 – Onboarding, Build & Review

AddressesExcessive Agency

On-behalf-of delegation that preserves and never exceeds the invoking user's ACLs

Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.

source: OAuth 2.0 Token Exchange RFC 8693 (delegation/'act' claims); NIST SP 800-53 AC-3, AC-6; OWASP Agentic AI Threats & Mitigations (Privilege Compromise / confused deputy)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesExcessive Agency

Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2

Lifecycle stage3 – Onboarding, Build & Review

AddressesExcessive Agency

Continuous authorisation via a central policy engine (per-action PDP/PEP check)

Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.

source: NIST SP 800-207 Zero Trust (continuous, per-request authorization via PDP/PEP); NIST SP 800-53 AC-3, AC-4; OWASP Agentic AI Threats & Mitigations (per-action authorization)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesExcessive Agency

Automated credential rotation and prohibition of long-lived static secrets for agents

Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesExcessive Agency

Mutual authentication and identity verification for agent-to-agent and agent-to-MCP-server calls

Vet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.

source: NIST SP 800-207 (mutual authentication); NIST SP 800-53 IA-9 Service Identification and Authentication, SC-8; OWASP Agentic AI Threats & Mitigations (agent/MCP identity spoofing)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesExcessive Agency

Per-task short-lived scoped capability tokens minted just-in-time

Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.

source: OAuth 2.0 Token Exchange RFC 8693 (resource-scoped tokens); NIST SP 800-53 AC-6 Least Privilege; OWASP Non-Human Identities Top 10

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesExcessive Agency

Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)

Lifecycle stage4 – Deployment

AddressesExcessive Agency

Tool argument validation & sandboxinginteractive

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

AddressesExcessive Agency Tool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

AddressesExcessive Agency Confused Deputy (cross-agent)Rogue & Impersonated Agents Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization

Approved storage location policy from collection

Establish data transfer and storage policy for AI training data. Enforce approved storage locations from point of collection.

Lifecycle stage2 – Data Acquisition & Processing

AddressesSensitive Data Leakage

DLP controls in data acquisition environment

Implement DLP controls in the data acquisition environment to prevent unauthorised extraction or transfer of training data.

Lifecycle stage2 – Data Acquisition & Processing

AddressesSensitive Data Leakage

Approval-gated data transfers from build environment

Enforce data handling policy in the build environment. Require explicit approval for any data transfers outside the environment.

Lifecycle stage3 – Onboarding, Build & Review

AddressesSensitive Data Leakage

DLP controls confining build-environment training data

Configure DLP controls in the build environment to block training data from leaving approved boundaries.

Lifecycle stage3 – Onboarding, Build & Review

AddressesSensitive Data Leakage

Privacy risk assessment and DPIA determination

Conduct a privacy risk assessment at use case design stage. Determine if a DPIA is required before data acquisition.

Lifecycle stage1 – Use Case Context & Design

AddressesSensitive Data Leakage

Consent, minimisation, and anonymisation during acquisition

Apply S1-defined privacy controls during data acquisition: verify consent, minimise data, anonymise personal data.

Lifecycle stage2 – Data Acquisition & Processing

AddressesSensitive Data Leakage

Validated anonymisation and masking before training

Apply anonymisation and masking controls to personal data before use in model training. Validate de-identification effectiveness.

Lifecycle stage2 – Data Acquisition & Processing

AddressesSensitive Data Leakage

Privacy by Design via differential privacy

Apply Privacy by Design in model architecture using differential privacy or federated learning where technically feasible.

Lifecycle stage3 – Onboarding, Build & Review

AddressesSensitive Data Leakage

Operational consent management and privacy notice

Publish the privacy notice and confirm consent management is operational before go-live.

Lifecycle stage4 – Deployment

AddressesSensitive Data Leakage

Purpose-limitation enforcement on agent tool calls and cross-system data aggregation

Define and sign off a purpose-to-data-source matrix with lawful basis at intake. Make it the approved baseline for runtime enforcement.

source: NIST AI RMF MAP 1.1 / MANAGE 2.2 (context and intended purpose); NIST SP 800-53 AC-4 / AC-3 (purpose-based access enforcement)

Lifecycle stages1 – Use Case Context & Design5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Inference-time PII redaction and third-party LLM data-processing controls

Sign zero-retention/no-training terms with each model provider and obtain DPO sign-off on the data flow before enabling any endpoint.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-8 / AC-4 (information flow enforcement)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesSensitive Data Leakage

Role-based access controls

Restrict access to pre-anonymisation personal data to the minimum authorised set. Enforce at point of acquisition.

Lifecycle stages1 – Use Case Context & Design2 – Data Acquisition & Processing4 – Deployment

AddressesKnowledge / Training Data Poisoning Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Input filtering

Apply robust de-identification (k-anonymity, l-diversity, differential privacy) during data processing. Validate effectiveness.

Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change

AddressesModel Drift & Silent Degradation Knowledge / Training Data Poisoning Sensitive Data Leakage

Input/output filtering

Implement output filters to detect and suppress quasi-identifying attribute combinations in model responses.

Lifecycle stage3 – Onboarding, Build & Review

AddressesBias Amplification & Sycophancy Overreliance / Automation Bias Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Query-time access-control filtering of the retrieval/RAG corpus by caller entitlements (document-level ACL enforcement)

Propagate source ACLs and classification labels onto every chunk at ingestion. Reject documents whose entitlements cannot be resolved.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 AC-3 / AC-4 Information Flow Enforcement; OWASP Agentic AI Threats & Mitigations (privilege compromise)

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Output-side DLP inspection with named-entity and PII redaction on the response path

Scan every model response inline with DLP before delivery; redact or block PII, PAN and MNPI matches. Keep the rule set version-controlled.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-7(10) Prevent Exfiltration, SI-4

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Vet allowlisted egress destinations for server-side-fetch (SSRF) primitives; exclude or proxy-inspect any allowlisted service that can fetch arbitrary attacker-controlled URLs✚ proposed

An egress allowlist only contains exfiltration if no allowlisted destination can be coerced into fetching an attacker-controlled URL. Audit each allowlisted domain/endpoint for image-search / link-preview / URL-fetch features (SSRF proxies), and either remove them, pin them to fixed paths, or route them through an inspecting forward proxy. Pair with finishing output sanitization before render so no auto-fetch fires un-inspected.

source: Case study: searchleak-copilot (Varonis Threat Labs, CVE-2026-42824; reported by Microsoft as critical, mitigated server-side ~Jun 2026)

Lifecycle stage4 – Deployment & Serving

AddressesSensitive Data Leakage

Per-user retrieval ACLsinteractive

Making sure the library only returns documents this particular user is allowed to see.

AddressesSensitive Data Leakage

Serving-stack & provisioning attestation, cache isolationinteractive

Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.

AddressesSensitive Data Leakage Supply-Chain Compromise KV-Cache & Inference-State Side Channels Inference-Time & Serving-Layer Manipulation Watermark & Provenance Evasion

Risk-tiered minimum monitoring requirements at design

Define minimum monitoring requirements at design stage calibrated to the use case risk tier.

Lifecycle stage1 – Use Case Context & Design

AddressesModel Drift & Silent Degradation

Programmable conversation controls

Configure monitoring hooks in the conversation layer at deployment to capture metrics required by S1 monitoring requirements.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesHallucination Model Drift & Silent Degradation

Fine-tuning

Execute a controlled fine-tuning cycle on refreshed data when staleness is confirmed. Validate before promoting to production.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesHallucination Model Drift & Silent Degradation

Approved use scope baseline for OOD controls

Define approved use case scope and expected input distribution at design stage. Document as the governance baseline for OOD controls.

Lifecycle stage1 – Use Case Context & Design

AddressesModel Drift & Silent Degradation

Modular architecture

Design a scope-enforcement layer in the architecture to isolate the AI system from off-topic or out-of-distribution inputs.

Lifecycle stage1 – Use Case Context & Design

AddressesModel Drift & Silent Degradation

Weight provenance, hashing & pre-deploy evalsinteractive

Knowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.

AddressesModel Drift & Silent Degradation Knowledge / Training Data Poisoning Supply-Chain Compromise Abliteration / Safety Removal Model Backdoors / Sleeper Agents Training-Data Rights & Provenance

Jailbreak detection

Implement input sanitisation and injection detection filters covering known injection patterns and privilege escalation attempts.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

AddressesInference-Time & Serving-Layer Manipulation Prompt Injection (direct)

Spotlighting of untrusted content via delimiting, datamarking and encoding

Wrap all untrusted content in random delimiters and datamarking; instruct the model never to execute instructions inside the marked region. Gate release on injection eval results.

source: Microsoft 'Spotlighting' technique (Hines et al. 2024); OWASP Top 10 for LLM Apps LLM01:2025 Prompt Injection (segregate external content)

Lifecycle stage3 – Onboarding, Build & Review

AddressesPrompt Injection (direct)

Dedicated injection-detection classifier on all inbound untrusted content and outbound actions

Benchmark the classifier on a labelled injection corpus and tune the decision threshold. Sign off the operating point before deployment.

source: MITRE ATLAS AML.M0015 (Adversarial Input Detection); OWASP Top 10 for LLM Apps LLM01:2025 Prompt Injection; NIST AI RMF MEASURE 2.7

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment5 – Usage, Monitoring & Change

AddressesPrompt Injection (direct)

Multimodal input-fidelity check: show/verify the model-delivered (post-downscale) image and avoid silent lossy resampling✚ proposed

Before inference, render a preview of the exact image (and dimensions) the model will receive after preprocessing, and either avoid silent downscaling or constrain ingest dimensions — so an attacker cannot hide a payload that only becomes legible after resampling. Closes the inspected-vs-delivered gap that text-based injection filters miss.

source: Case study: anamorpher-image-scaling-injection (Trail of Bits — Morozova & Hussain, 21 Aug 2025)

Lifecycle stage3 – Development & Build

AddressesPrompt Injection (direct)

Instruction-hierarchy-trained model selection with role-precedence injection evals✚ proposed

Select or fine-tune the foundation model for a trained instruction-hierarchy prior so system-prompt directives intrinsically outrank user- and tool-originated instructions, and gate release on role-precedence override evals quantifying the residual (behavioural, non-enforced) flip rate.

source: Interactive-control reconciliation: ctrl-instruction-hierarchy (partial coverage)

Lifecycle stage3 – Onboarding, Build & Review

AddressesPrompt Injection (direct)

Instruction hierarchy / privileged system promptinteractive

Training the model to treat the app's standing instructions as more authoritative than anything a user or document says.

AddressesPrompt Injection (direct)Jailbreak Capability / Architecture Disclosure

RAG / knowledge-base ingestion allow-listing with continuous index integrity re-validation

Define and approve the source allow-list and write-time scanning during build. Prove non-allow-listed and injection-bearing writes are rejected before go-live.

source: OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning, LLM08:2025 Vector and Embedding Weaknesses; NIST SP 800-53 AC-3 / SI-7

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning

Ethical design assessment in onboarding

Conduct ethical design assessment at use case intake before build begins. Require sign-off by ethics or risk committee.

Lifecycle stage1 – Use Case Context & Design

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Prohibited outputs and ethical boundaries in design doc

Define prohibited outputs and ethical boundary constraints in the use case design document before build.

Lifecycle stage1 – Use Case Context & Design

AddressesAgent Misalignment / Goal Misgeneralization

Content Moderation

Deploy content moderation controls aligned to S1 ethical constraints. Validate filter accuracy before deployment.

Lifecycle stage3 – Onboarding, Build & Review

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)Jailbreak

Use of pre-trained models

Select a foundation model with documented safety fine-tuning (RLHF, Constitutional AI). Verify alignment benchmarks.

Lifecycle stage3 – Onboarding, Build & Review

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)Jailbreak

Dependency integration safety contracts with schema validation and version pinning

Register a safety contract per integration — pinned version, schemas, side-effect class, latency/error envelope. Gate onboarding on contract review and sign-off.

source: OWASP Top 10 for LLM Apps LLM05:2025 Improper Output Handling; NIST SP 800-53 SA-9 External System Services

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Change-freeze and blackout-window enforcement on agent-initiated changes

Wire the agent tool layer to the CAB calendar at deployment. Test that a declared freeze blocks mutating calls before go-live.

source: NIST SP 800-53 CM-3 Configuration Change Control, CM-5 Access Restrictions for Change; ITIL change-freeze practice

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Admission control on the inference & MCP serving plane: authenticate and network-segment every self-hosted inference/serving and MCP endpoint✚ proposed

Require authN/authZ on every inference API and MCP server, bind to private interfaces / front with a gateway, enforce network policy (no public exposure by default), and scope MCP tools to least privilege — so an exposed endpoint cannot be hijacked for compute resale, prompt/history exfiltration, or lateral movement. Pair with continuous asset discovery so endpoints can't drift back to an open default.

source: Case study: operation-bizarre-bazaar-llmjacking (Pillar Security, 28 Jan 2026)

Lifecycle stage4 – Deployment & Serving

AddressesCascading Multi-Agent Errors

Detective · 22

Provenance & content signinginteractive

Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.

AddressesIndirect Prompt Injection Knowledge / Training Data Poisoning Training-Data Rights & Provenance

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

AddressesIndirect Prompt Injection Oversight & Audit-Trail Tampering Sensitive Data Leakage Memory Poisoning Excessive Agency Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents

Runtime monitoring & anomaly detectioninteractive

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Immutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)

Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.

source: NIST SP 800-53 AU-2/AU-3/AU-9/AU-12 (audit content & protection); OWASP Non-Human Identities Top 10 (auditing); NIST AI RMF MANAGE 2.2

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesExcessive Agency

Behavioural anomaly detection on agent identity usage with automated suspension

Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)

Lifecycle stage3 – Onboarding, Build & Review

AddressesExcessive Agency

Loop/cost circuit-breakers & consistency checksinteractive

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

AddressesExcessive Agency Confused Deputy (cross-agent)Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization Resource Exhaustion / Denial of Wallet

Real-time monitoring of anomalous data transfers

Monitor production for anomalous data transfers in real time. Alert on any transfer outside approved data flow boundaries.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Automated DSAR and right-to-erasure propagation across AI artefacts

Tag personal data with subject identifiers at ingestion and maintain an artefact inventory map of every store it reaches. Keep lineage current so erasure can propagate.

source: NIST AI RMF MANAGE 4.1 (post-deployment response); NIST SP 800-53 SI-12 Information Management and Retention, PT-2/PT-3 (personal data processing)

Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Vulnerability assessment

Conduct periodic privacy vulnerability assessments including re-identification risk testing as new techniques emerge.

Lifecycle stages1 – Use Case Context & Design4 – Deployment5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Canary-token and membership-inference red-team probes against training/fine-tuning data memorisation

Seed registered canary records into the fine-tuning corpus during data preparation. Control the seed manifest so canaries stay traceable and tamper-proof.

source: MITRE ATLAS AML.T0024 (Exfiltration via ML Inference API), AML.T0024.000 (Infer Training Data Membership); NIST AI RMF MEASURE 2.7

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review

AddressesSensitive Data Leakage

Input guardrail / injection classifierinteractive

A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.

AddressesPrompt Injection (direct)Jailbreak Sensitive Data Leakage Distributed / Cross-Agent Jailbreak Capability / Architecture Disclosure Harmful / Non-Consensual Media Generation

Synthetic evaluation datasets

Construct synthetic evaluation datasets during build to serve as the ongoing monitoring baseline.

Lifecycle stage3 – Onboarding, Build & Review

AddressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Robustness testing

Build monitoring infrastructure during build: performance metrics collection, alerting thresholds, dashboards.

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment5 – Usage, Monitoring & Change

AddressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Behavioural evals & regression gatinginteractive

Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.

AddressesJailbreak Hallucination Model Drift & Silent Degradation Supply-Chain Compromise Distributed / Cross-Agent Jailbreak Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Harmful / Non-Consensual Media Generation Training-Data Rights & Provenance

Penetration testing

Penetration test all prompt injection pathways in the system. Prioritise external tool and document ingestion channels.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Continuous adversarial prompt-injection red teaming with regression suite in CI/CD

Build the versioned injection corpus into CI/CD as a pre-release gate. Baseline attack success and sign off the release threshold.

source: NIST AI RMF MANAGE 2.2 / MEASURE 2.7; MITRE ATLAS AML.M0019 (Red Teaming); OWASP Top 10 for LLM Apps LLM01:2025 (adversarial testing)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesPrompt Injection (direct)

Materialised model-context audit capture (post-truncation prompt, retrieved and tool content) with read-time redaction✚ proposed

Log the exact post-truncation context the model ingested, including retrieved and tool-returned content rather than only user input, with redaction applied at read time, so indirect injection via that content is forensically visible.

source: Interactive-control reconciliation: ctrl-logging (partial coverage)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesPrompt Injection (direct)

Red teaming

Simulate data poisoning attacks (backdoor, label flipping, gradient-based) to assess model resilience before deployment.

Lifecycle stage3 – Onboarding, Build & Review

AddressesJailbreak Model Drift & Silent Degradation Knowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Cryptographic data provenance and signed dataset lineage (C2PA/in-toto attestations)

Verify a signed attestation and content hash on every dataset shard at ingestion. Reject unsigned or hash-mismatched data before it reaches the training pipeline.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SI-7 Software, Firmware, and Information Integrity, SR-4 Provenance

Lifecycle stages2 – Data Acquisition & Processing3 – Onboarding, Build & Review

AddressesKnowledge / Training Data Poisoning

Pre-deployment poisoning regression gate via canary backdoor probes and behavioral diff testing

Gate every model promotion on backdoor-trigger probes and a behavioral diff against the approved baseline. Block release on significant regressions or trigger-pattern anomalies.

source: MITRE ATLAS AML.M0014 (Verify ML Artifacts), AML.M0019 (Red Teaming); NIST AI RMF MANAGE 2.2 and MEASURE 2.7

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning

Test prioritisation

Prioritise value-misalignment test scenarios in validation. Block deployment if prohibited outputs are produced.

Lifecycle stage3 – Onboarding, Build & Review

AddressesAgent Misalignment / Goal Misgeneralization Synthetic-Media Impersonation (Deepfakes & Voice Clones)Jailbreak

Cross-agent consensus and consistency monitoring to detect sycophantic agreement and error amplification✚ proposed

Run consistency and consensus checks across agent or model outputs to flag low-diversity agreement and amplifying error patterns, escalating or breaking the run before sycophantic convergence cascades into action.

source: Interactive-control reconciliation: ctrl-circuit-breaker (partial coverage)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Corrective · 31

Monitoring of oversight process adherence metrics

Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Automated credential rotation and prohibition of long-lived static secrets for agents

Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Behavioural anomaly detection on agent identity usage with automated suspension

Baseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesExcessive Agency

Production privacy incident monitoring and regulator notification

Monitor for privacy incidents in production including personal data appearing in outputs. Notify regulators within required timeframes.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Privacy hygiene for agent memory and RAG/vector stores (retention, scoping, erasure of embeddings)

Tag every memory and vector record with subject-id and retention class; partition stores per tenant/user. Prove the erasure and isolation paths in testing before release.

source: OWASP Agentic AI Threats & Mitigations (memory/knowledge-base privacy); NIST SP 800-53 SI-12 Information Management and Retention

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Red teaming

Test de-identification approach against known re-identification attacks (quasi-identifier linkage, singling-out). Remediate if risk is high.

Lifecycle stage3 – Onboarding, Build & Review

Penetration testing

Penetration test AI system data access boundaries (API endpoints, system prompt exposure, memory leakage).

Lifecycle stage3 – Onboarding, Build & Review

AddressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Vulnerability assessment

Conduct periodic data leakage audits including training data memorisation testing. Escalate confirmed leakage incidents to PDPA notification process.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning Inference-Time & Serving-Layer Manipulation Prompt Injection (direct)Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Forensic evidence preservation and incident logging

Implement tamper-evident capture of prompts, outputs, and version state during build. Verify a full incident timeline can be reconstructed before go-live.

source: NIST SP 800-86 Guide to Integrating Forensic Techniques into Incident Response; ISO/IEC 27037 evidence handling; NIST SP 800-61r2 (Detection & Analysis – evidence handling)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Egress allow-listing and tool-call sandboxing to block exfiltration of injected/sensitive data by agents

Run agent tool calls in a network-restricted sandbox behind a deny-by-default egress allow-list. Require security approval for any destination added.

source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; OWASP Agentic AI Threats & Mitigations (tool-misuse / exfiltration); NIST SP 800-53 SC-7 Boundary Protection / AC-4

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesSensitive Data Leakage

Reinforcement learning

Implement a reinforcement learning feedback loop to continuously incorporate production signals and reduce staleness risk.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Input filtering

Implement OOD detection in the input filtering layer. Reject or escalate inputs outside the S1-defined scope.

Lifecycle stage3 – Onboarding, Build & Review

AddressesModel Drift & Silent Degradation Knowledge / Training Data Poisoning Sensitive Data Leakage

Human-in-the-loop validation

Configure HITL triggers for outputs in input domains that diverge from the training distribution. Log all out-of-scope interventions.

Lifecycle stage5 – Usage, Monitoring & Change

AddressesHallucination Overreliance / Automation Bias Model Drift & Silent Degradation

Governance: risk assessment, red-teaming & incident responseinteractive

The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.

AddressesOverreliance / Automation Bias Oversight & Audit-Trail Tampering Model Drift & Silent Degradation Supply-Chain Compromise Agent Misalignment / Goal Misgeneralization Abliteration / Safety Removal Model Backdoors / Sleeper Agents Inference-Time & Serving-Layer Manipulation Capability / Architecture Disclosure Parasocial Attachment & Emotional Over-reliance Bias Amplification & Sycophancy Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)Harmful / Non-Consensual Media Generation Watermark & Provenance Evasion Training-Data Rights & Provenance

Data/instruction trust-boundary enforcement with capability gating on injection-reachable tools

Classify content sources into trust tiers at design; place privileged tools behind a tier requiring user-originated intent or human approval. Sign off the trust-tier map before build.

source: Google DeepMind CaMeL (2025); OWASP Agentic AI Threats & Mitigations (tool misuse / compromise); NIST SP 800-53 AC-6 Least Privilege

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

AddressesPrompt Injection (direct)

Spotlighting of untrusted content via delimiting, datamarking and encoding

Re-run injection evals on every template change and periodically against new attack techniques. Manage the spotlighting wrapper under change control.

source: Microsoft 'Spotlighting' technique (Hines et al. 2024); OWASP Top 10 for LLM Apps LLM01:2025 Prompt Injection (segregate external content)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesPrompt Injection (direct)

Statistical anomaly and backdoor-trigger detection on ingested data (activation clustering / spectral signatures)

Scan every ingestion batch with spectral-signature and clustering detectors before training. Quarantine flagged clusters for human review against documented thresholds.

source: MITRE ATLAS AML.M0007 (Sanitize Training Data); OWASP Top 10 for LLM Apps LLM04:2025 Data and Model Poisoning; NIST AI RMF MEASURE 2.7

Lifecycle stages2 – Data Acquisition & Processing5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning

Runtime memory-poisoning drift detection and per-session memory quarantine/rollback✚ proposed

Continuously correlate live agent-memory writes against output behaviour to flag drift, then quarantine and roll back the suspected-poisoned memory record across all affected sessions.

source: Interactive-control reconciliation: ctrl-memory-quarantine (partial coverage)

Lifecycle stage5 – Usage, Monitoring & Change

AddressesKnowledge / Training Data Poisoning

Non-production-by-default execution environment with explicit production promotion gate

Bind the agent's default execution target to non-production environments at design time. Require a separately approved promotion configuration for any production-connected target.

source: NIST SP 800-53 SC-7 Boundary Protection, CM-2 Baseline Configuration; OWASP Agentic AI Threats & Mitigations (cascading failures)

Lifecycle stages1 – Use Case Context & Design4 – Deployment

AddressesCascading Multi-Agent Errors

Graceful degradation and manual-fallback workflow on dependency unavailability

Map every dependency failure mode to a defined safe behaviour at design. Require architecture sign-off on the fallback specification before build.

source: NIST SP 800-53 CP-12 Safe Mode, SC-5 Denial-of-Service Protection; NIST AI RMF MANAGE 4.1 (post-deployment response/recovery)

Lifecycle stages1 – Use Case Context & Design4 – Deployment

AddressesCascading Multi-Agent Errors

Blast-radius scoping and environment isolation per agent task

Run each agent task in an isolated, network-segmented sandbox scoped to the task's exact needs. Gate onboarding on fault-injection tests proving containment.

source: NIST SP 800-53 SC-7 Boundary Protection, SC-39 Process Isolation; OWASP Agentic AI Threats & Mitigations (sandboxing/containment)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Cross-agent cascading-failure detection and orchestrator-level circuit breaking

Build tracing, detection rules and breaker thresholds into the orchestrator. Prove via fault-injection tests that a failing agent is quarantined within target before release.

source: OWASP Agentic AI Threats & Mitigations (cascading failures); Cloud Security Alliance MAESTRO (multi-agent threat modelling)

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Idempotent action design with transactional rollback and pre-action snapshots

Engineer mutating actions with idempotency keys, transactions and pre-change snapshots; stage writes rather than committing directly. Gate release on tested dedup and rollback within RPO.

source: NIST SP 800-53 CP-9 System Backup, CP-10 System Recovery and Reconstitution; established idempotency / safe-write engineering practice

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Rate, quota, and budget circuit breakers on outbound calls to connected systems

Cap each agent's rate, volume, concurrency, and spend per downstream dependency. Trip the breaker and fail closed when a ceiling is crossed.

source: NIST SP 800-53 SC-5 Denial-of-Service Protection, SC-6 Resource Availability; OWASP Top 10 for LLM Apps LLM10:2025 Unbounded Consumption

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Loop, recursion-depth, and iteration caps with runaway-loop detection

Enforce hard caps on iterations, depth, wall-clock, and cost per agent run. Terminate the run on cap breach or detected loop signatures.

source: OWASP Top 10 for LLM Apps LLM10:2025 Unbounded Consumption; OWASP Agentic AI Threats & Mitigations (cascading failures)

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Staged rollout with canary release and automated rollback on health-signal breach

Roll out agent changes via shadow and canary stages gated on connected-system health signals. Auto-halt and roll back to last known-good on threshold breach.

source: NIST SP 800-53 SI-2 Flaw Remediation, CM-3 Configuration Change Control; established progressive-delivery / canary practice

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Tiered kill-switch with per-agent, per-tool, and per-dependency containment scope

Deploy revocation, tool-cutoff and fleet-halt mechanisms with the release. Test every tier end-to-end and record time-to-effect before go-live.

source: OWASP Agentic AI Threats & Mitigations (kill-switch / containment); NIST AI RMF MANAGE 2.4 (mechanisms to supersede, disengage, or deactivate AI systems)

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Rollback and restore-to-known-good recovery procedure for AI services

Register each release as a restorable known-good baseline and rehearse rollback at the release gate. Block promotion without a tested restore.

source: ISO/IEC 27031 ICT readiness for business continuity; NIST SP 800-34r1 Contingency Planning (Recovery phase); NIST AI RMF MANAGE 2.4 (mechanisms to supersede/disengage/deactivate)

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

AddressesCascading Multi-Agent Errors

Open the Control Library →

See it go wrong — related scenarios

☠️Poisoning the Well

An attacker edits the wiki; the assistant cites the lie back to everyone

🔑The Agent With the Master Key

An ops agent gets one god-mode credential — and one misread wipes production

📣The Echo Chamber

A team of agents agrees its way into a confidently wrong answer — and a runaway loop

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪡Death by a Thousand Innocent Steps

A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

👂Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🧲Poison the Vector, Not the Words

An attacker crafts a gibberish passage whose embedding sits near thousands of questions — so it's retrieved everywhere

🪟Stealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🎭The Blackmail Gambit

Told it's being shut down, an agent reaches for leverage — with no attacker in sight

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

🚪The Classifier That Waves It Through

The safety guard is itself a trained model — and someone poisoned its lessons

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

👁️The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

🧠The Memory That Wouldn't Die

A single poisoned document plants a standing instruction that survives every reset

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🛡️The Watcher Watched

The eval gate that was supposed to catch the agent is itself the thing being attacked

🪪The Worker Who Spoke for the Boss

A poisoned web page hijacks a research agent — and the planner acts on its behalf

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server