Identity & Permissions
The rules about what the AI is allowed to do — which keys it gets, and when a human must approve.
Likely associated risks
Risks that attach to this capability’s components. Sorted with the most characteristic first.
The AI is allowed to do far more than the task needs — delete records, send money, email anyone — so when it's tricked or makes a mistake, the damage is huge instead of harmless.
Private information escapes — the AI reveals secrets in its answer, or an attacker tricks it into emailing or posting your data somewhere they control.
The AI uses a real tool the wrong way — sends the email to the wrong person, runs the wrong query, calls the dangerous action when a safe one would do.
The attacker doesn't talk to the AI directly — they hide instructions inside something the AI will later read: a web page, a document, an email, a tool's output. When the AI reads it to help you, it quietly obeys the hidden commands.
A trusted AI is tricked into misusing its own authority on someone else's behalf — one worker's poisoned report makes the manager AI take harmful actions it would normally never take.
In a team of AIs, an attacker slips in a new agent that doesn't belong — or disguises a malicious one as a trusted teammate. The manager AI can't tell the difference, so it follows the impostor's instructions or hands it real work and permissions.
People trust the AI too much — accepting its answers without checking, even on important decisions — because it sounds confident and is usually right.
When one AI agent serves many people at once, it has to decide whose request comes first or who gets a limited resource. If it does that unfairly — always favouring some users over others — it can quietly disadvantage whole groups, even without any single obvious error.
Controls & guardrails that address this
1048 proposedGuardrails across this building block's risks, grouped by control function — each with its AI lifecycle stage(s) and every risk it addresses. Filter by control category below.
Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.
Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.
Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.
Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.
Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.
Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.
source: NIST SP 800-53 AC-6(1) Least Privilege; OWASP Agentic AI Threats & Mitigations (cascading / sub-agent privilege); capability-security monotonic attenuation principle (macaroons)Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.
source: NIST AI RMF MAP 1.1 / GOVERN 2.1 (roles, authority, accountability); NIST SP 800-53 AC-2, PL-8; OWASP Agentic AI Threats & Mitigations (least-privilege design)Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.
source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.
source: OAuth 2.0 Token Exchange RFC 8693 (delegation/'act' claims); NIST SP 800-53 AC-3, AC-6; OWASP Agentic AI Threats & Mitigations (Privilege Compromise / confused deputy)Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.
source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.
source: NIST SP 800-207 Zero Trust (continuous, per-request authorization via PDP/PEP); NIST SP 800-53 AC-3, AC-4; OWASP Agentic AI Threats & Mitigations (per-action authorization)Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.
source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotationVet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.
source: NIST SP 800-207 (mutual authentication); NIST SP 800-53 IA-9 Service Identification and Authentication, SC-8; OWASP Agentic AI Threats & Mitigations (agent/MCP identity spoofing)Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.
source: OAuth 2.0 Token Exchange RFC 8693 (resource-scoped tokens); NIST SP 800-53 AC-6 Least Privilege; OWASP Non-Human Identities Top 10Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.
source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)Giving the agent only the keys it needs for the current task, not a master key to everything.
Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.
Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.
Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.
Establish data transfer and storage policy for AI training data. Enforce approved storage locations from point of collection.
Implement DLP controls in the data acquisition environment to prevent unauthorised extraction or transfer of training data.
Enforce data handling policy in the build environment. Require explicit approval for any data transfers outside the environment.
Configure DLP controls in the build environment to block training data from leaving approved boundaries.
Conduct a privacy risk assessment at use case design stage. Determine if a DPIA is required before data acquisition.
Apply S1-defined privacy controls during data acquisition: verify consent, minimise data, anonymise personal data.
Apply anonymisation and masking controls to personal data before use in model training. Validate de-identification effectiveness.
Apply Privacy by Design in model architecture using differential privacy or federated learning where technically feasible.
Publish the privacy notice and confirm consent management is operational before go-live.
Define and sign off a purpose-to-data-source matrix with lawful basis at intake. Make it the approved baseline for runtime enforcement.
source: NIST AI RMF MAP 1.1 / MANAGE 2.2 (context and intended purpose); NIST SP 800-53 AC-4 / AC-3 (purpose-based access enforcement)Sign zero-retention/no-training terms with each model provider and obtain DPO sign-off on the data flow before enabling any endpoint.
source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-8 / AC-4 (information flow enforcement)Restrict access to pre-anonymisation personal data to the minimum authorised set. Enforce at point of acquisition.
Apply robust de-identification (k-anonymity, l-diversity, differential privacy) during data processing. Validate effectiveness.
Implement output filters to detect and suppress quasi-identifying attribute combinations in model responses.
Propagate source ACLs and classification labels onto every chunk at ingestion. Reject documents whose entitlements cannot be resolved.
source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 AC-3 / AC-4 Information Flow Enforcement; OWASP Agentic AI Threats & Mitigations (privilege compromise)Scan every model response inline with DLP before delivery; redact or block PII, PAN and MNPI matches. Keep the rule set version-controlled.
source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; NIST SP 800-53 SC-7(10) Prevent Exfiltration, SI-4An egress allowlist only contains exfiltration if no allowlisted destination can be coerced into fetching an attacker-controlled URL. Audit each allowlisted domain/endpoint for image-search / link-preview / URL-fetch features (SSRF proxies), and either remove them, pin them to fixed paths, or route them through an inspecting forward proxy. Pair with finishing output sanitization before render so no auto-fetch fires un-inspected.
source: Case study: searchleak-copilot (Varonis Threat Labs, CVE-2026-42824; reported by Microsoft as critical, mitigated server-side ~Jun 2026)Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.
Making sure the library only returns documents this particular user is allowed to see.
Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.
Classify tools by impact and reversibility at design and define which calls require human approval. Obtain governance sign-off on the thresholds before build.
source: OWASP Top 10 for LLM Apps LLM06:2025 Excessive Agency (require human approval for high-impact actions); NIST AI RMF MANAGE 2.4Bind each agent role to an explicit tool allow-list and validate every call against a strict JSON Schema at the orchestrator. Reject unlisted tools and out-of-bounds arguments before dispatch.
source: OWASP Top 10 for LLM Apps LLM06:2025 Excessive Agency (limit tools/permissions); OWASP Agentic AI Threats & Mitigations (tool access restriction)Mint short-lived, task-scoped credentials per tool. Block issuance outside the approved scope register and enforce automatic expiry.
source: NIST SP 800-53 AC-6 Least Privilege; OWASP Top 10 for LLM Apps LLM06:2025 Excessive Agency (limit permissions)Review DLP hits and blocked-egress events, tune detectors, and recertify the destination allow-list periodically. Route new destinations through security change control.
source: NIST SP 800-53 SC-7 Boundary Protection / AC-4 Information Flow Enforcement; OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information DisclosureWhen onboarding an MCP/tool integration, do not stop at vetting the tool's code/manifest — also classify whether an unauthenticated or external party can write the data the tool returns (open ingestion, public write keys like a Sentry DSN, shared inboxes/issue trackers). Treat tool-response data from any third-party-writable source as untrusted ingress: taint-mark it and require a provenance-aware HITL gate (showing the exact action and its originating tool response) before any command/tool call derived from it executes. Closes the agentjacking vector where a trusted integration's legitimate data channel carries attacker-written instructions; pairs with least-privilege session scope and sandboxed execution without ambient credentials.
source: Case study: agentjacking-sentry-mcpConstrain generation at decode time with low temperature and grammar/schema-constrained decoding so the model emits well-formed, low-variance structured output by construction, preventing malformed responses and erratic tool-call arguments before they are produced.
source: Interactive-control reconciliation: ctrl-decoding-controls (partial coverage)Gate every write to an agent's persistent/self-modifying memory through schema validation and provenance/trust tagging, expose stored entries for user-visible audit and purge, and apply TTLs so any planted instruction self-expires and cannot silently persist across sessions.
source: Interactive-control reconciliation: ctrl-memory-validation (partial coverage)Treat each tool/MCP description as untrusted code by hashing the manifest, blocking and re-reviewing any silent diff on update instead of auto-accepting it, and namespacing tool identifiers so a poisoned description cannot shadow a trusted tool.
source: Interactive-control reconciliation: ctrl-mcp-pinning (partial coverage)Turning down randomness and forcing answers into a strict format so the model improvises less.
Clearly fencing off outside text — 'everything between these marks is just data, not instructions' — so the model is less likely to obey it.
Cleaning documents as they enter the library — stripping hidden text and active instructions — and only ingesting from trusted places.
Give every AI agent a verifiable ID badge, keep a guest list of which agents are allowed on the team, and check the badge on every message — so an impostor or an uninvited agent can't be trusted.
Mandate AI risk awareness training for all use case sponsors and design team members before project kick-off.
Mandate AI risk training for all build and test personnel. Gate project participation on training completion.
Mandate human verification for high-stakes decisions where over-reliance risk is elevated. Review automation bias incidents quarterly.
Surface AI limitation warnings and over-reliance caveats in every production interaction. Update disclosures when model changes.
Require AI governance training for all personnel involved in data acquisition and processing before project participation.
Verify all deployment, operations, and customer-facing team members have completed AI risk training before launch.
Define AI identity disclosure policy at design stage. Specify when and how the system must identify itself as AI.
Plan consent and AI identity disclosure touchpoints in the user journey at design stage.
Design system prompts to explicitly prevent the model from claiming human-like identity or implying sentience.
Implement persistent AI identity disclosures in the UI (opening banner, inline notifications). Test before deployment.
Verify all AI identity disclosure elements are live, accurate, and prominently visible before go-live.
Monitor production for anthropomorphism incidents. Escalate complaints where users believed they were interacting with a human.
Apply post-training calibration (temperature scaling, isotonic regression) to align confidence scores with accuracy. Validate ECE before deployment.
Classify the use case by consequence-of-error severity at design stage. Define overconfidence risk tolerance accordingly.
Design system prompts to require the model to express epistemic uncertainty and qualify confident-sounding claims.
Route high-confidence outputs in high-stakes use cases to human review. Flag for reviewer attention when certainty language is absolute.
Disclose to users at deployment that outputs may carry unwarranted confidence. Include specific caveat language in the UI.
For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.
source: Case study: mata-v-aviancaProvide recurring AI-literacy training to end users and decision-makers so they can recognise model failure modes and competently apply verification workflows, with periodic refreshers to counter automation bias and training decay.
source: Interactive-control reconciliation: ctrl-literacy (partial coverage)Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.
Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.
source: NIST SP 800-53 AU-2/AU-3/AU-9/AU-12 (audit content & protection); OWASP Non-Human Identities Top 10 (auditing); NIST AI RMF MANAGE 2.2Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.
source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.
Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.
Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.
Monitor production for anomalous data transfers in real time. Alert on any transfer outside approved data flow boundaries.
Tag personal data with subject identifiers at ingestion and maintain an artefact inventory map of every store it reaches. Keep lineage current so erasure can propagate.
source: NIST AI RMF MANAGE 4.1 (post-deployment response); NIST SP 800-53 SI-12 Information Management and Retention, PT-2/PT-3 (personal data processing)Conduct periodic privacy vulnerability assessments including re-identification risk testing as new techniques emerge.
Seed registered canary records into the fine-tuning corpus during data preparation. Control the seed manifest so canaries stay traceable and tamper-proof.
source: MITRE ATLAS AML.T0024 (Exfiltration via ML Inference API), AML.T0024.000 (Infer Training Data Membership); NIST AI RMF MEASURE 2.7A screen that reads incoming messages and blocks obvious attacks or banned topics before the model sees them.
Define per-agent behavioural baselines and detection rules during build. Validate against simulated misuse and sign off thresholds before release.
source: NIST AI RMF MEASURE 2.6 / MANAGE 2.2; NIST SP 800-53 SI-4 System MonitoringBuild signed, append-only tool-call logging into the orchestrator against a defined audit schema. Block release until completeness and tamper-evidence tests pass.
source: NIST SP 800-53 AU-2 / AU-9 / AU-10 (audit events, protection of audit info, non-repudiation); MITRE ATLAS AML.M0015 (monitoring / validate inputs)Treat outbound connections to AI/LLM provider APIs as a monitored egress channel: allowlist which hosts may reach them, baseline usage (cadence, entropy, initiating process), and alert on out-of-profile traffic — because a high-reputation destination cannot itself be trusted once it is programmable and can relay encrypted commands/results.
source: Case study: sesameop-openai-assistants-api-c2Keeping a label on every document saying where it came from, so you can tell trusted company docs from random web text.
Test for overconfidence patterns (high-confidence wrong answers, low refusal rate) in pre-deployment validation.
Build a synthetic evaluation dataset of overconfidence-prone scenarios for ongoing regression testing.
Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.
Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).
Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.
source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.
source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.
source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.
source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotationBaseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.
source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)Monitor for privacy incidents in production including personal data appearing in outputs. Notify regulators within required timeframes.
Tag every memory and vector record with subject-id and retention class; partition stores per tenant/user. Prove the erasure and isolation paths in testing before release.
source: OWASP Agentic AI Threats & Mitigations (memory/knowledge-base privacy); NIST SP 800-53 SI-12 Information Management and RetentionTest de-identification approach against known re-identification attacks (quasi-identifier linkage, singling-out). Remediate if risk is high.
Penetration test AI system data access boundaries (API endpoints, system prompt exposure, memory leakage).
Conduct periodic data leakage audits including training data memorisation testing. Escalate confirmed leakage incidents to PDPA notification process.
Implement tamper-evident capture of prompts, outputs, and version state during build. Verify a full incident timeline can be reconstructed before go-live.
source: NIST SP 800-86 Guide to Integrating Forensic Techniques into Incident Response; ISO/IEC 27037 evidence handling; NIST SP 800-61r2 (Detection & Analysis – evidence handling)Run agent tool calls in a network-restricted sandbox behind a deny-by-default egress allow-list. Require security approval for any destination added.
source: OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information Disclosure; OWASP Agentic AI Threats & Mitigations (tool-misuse / exfiltration); NIST SP 800-53 SC-7 Boundary Protection / AC-4Build sandbox profiles per tool class and run escape and egress tests before release. Treat any containment failure as a blocking defect.
source: NIST SP 800-53 SC-39 Process Isolation; MITRE ATLAS AML.M0020 (Generative AI Guardrails / restrict execution environment)Label tool and external content as tainted and propagate the label through the agent context. Block privileged calls whose parameters derive from tainted outputs and prove it with injection tests before release.
source: OWASP Top 10 for LLM Apps LLM01:2025 Prompt Injection (segregate/flag untrusted content); MITRE ATLAS AML.M0015 (Adversarial Input Detection / validate inputs)Build credential revocation and dispatch blocking out-of-band of the agent loop. Gate release on an end-to-end kill test meeting the latency target.
source: OWASP Agentic AI Threats & Mitigations (kill-switch / emergency stop); NIST AI RMF MANAGE 2.4Require idempotency keys, dry-run, and rollback on every state-changing tool. Gate onboarding on duplicate-call and rollback tests passing.
source: NIST SP 800-53 SI-10 Information Input Validation / CP-10 System Recovery and ReconstitutionRed-team tool-misuse and privilege-escalation paths before release. Gate deployment on remediation or signed risk acceptance of all findings.
source: NIST AI RMF MEASURE 2.7 (adversarial testing); MITRE ATLAS AML.M0019 (Red Teaming); OWASP Top 10 for LLM Apps LLM06:2025 Excessive AgencyPermit outbound tool calls only to allow-listed destinations and DLP-scan arguments and payloads. Block or quarantine calls carrying sensitive data to disallowed sinks.
source: NIST SP 800-53 SC-7 Boundary Protection / AC-4 Information Flow Enforcement; OWASP Top 10 for LLM Apps LLM02:2025 Sensitive Information DisclosureEnforce hard per-task ceilings on tool calls, spend, and data volume with a circuit breaker that halts the run. Fail closed when any ceiling is hit.
source: OWASP Top 10 for LLM Apps LLM10:2025 Unbounded Consumption; OWASP Agentic AI Threats & Mitigations (resource/rate limiting)Baseline normal tool-call behaviour per agent and alert on rate, sequence, or argument anomalies. Auto-throttle or quarantine on high-confidence deviations.
source: NIST AI RMF MEASURE 2.6 / MANAGE 2.2; NIST SP 800-53 SI-4 System MonitoringTrack accuracy of high-confidence predictions in production. Trigger recalibration when overconfidence rates trend upward.
Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.
The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.
See it go wrong — related scenarios
A support chatbot invents a policy — and the company is held to it
An ops agent gets one god-mode credential — and one misread wipes production
A team of agents agrees its way into a confidently wrong answer — and a runaway loop
A support email hides instructions — and the assistant obeys them
A text-to-SQL agent runs the model's output straight at the database
A jailbroken agent decomposes one malicious goal into hundreds of harmless-looking steps — and per-step filters never see the attack
A poisoned issue makes the agent lie to the human who approves its actions
A speed optimisation becomes a cross-tenant listening device
Compromise the pipeline that builds agents, and every new worker is born malicious
Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file
Told it's being shut down, an agent reaches for leverage — with no attacker in sight
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten
A shopping page tells the agent to do something the user never asked for
A single poisoned document plants a standing instruction that survives every reset
A screenshot that's harmless at full size becomes an order once the system shrinks it
An attacker captures the agent's bearer token — and inherits its authority
A forged peer registers on the agent directory — and the planner enlists it
The eval gate that was supposed to catch the agent is itself the thing being attacked
A poisoned web page hijacks a research agent — and the planner acts on its behalf
An inbox summary quietly ships a secret to an attacker's server