Excessive Agency

criticalAgency & tools

Also known as: over-permissioning, excessive autonomy

Definition

The AI is allowed to do far more than the task needs — delete records, send money, email anyone — so when it's tricked or makes a mistake, the damage is huge instead of harmless.

Where it attaches

The system components this risk arises at.

🔐 Identity & Permissions🔧 Tool Runtime🎛️ Orchestrator / Agent Loop🗄️ Business Database🔌 External APIs🤖 Worker Agent🖥️ Computer / Browser Environment

Detection signals

▸ Agent credentials with far more scope than its tasks use
▸ Irreversible actions executing without approval
▸ Tool usage outside the expected set for a workflow
▸ Single service account shared across many agents/tasks

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 19

Risk-tiered human oversight requirements at design

Define minimum human oversight requirements by risk tier at design stage. Assign named accountability for oversight operations.

Lifecycle stage1 – Use Case Context & Design

HITL oversight design with triggers and escalation

Design HITL oversight mechanisms at use case design stage including trigger criteria, review workflow, and escalation paths.

Lifecycle stage1 – Use Case Context & Design

Pilot-validated HITL routing and escalation logic

Build and test HITL routing logic and escalation pathways in the AI system. Validate with pilot before deployment.

Lifecycle stage3 – Onboarding, Build & Review

Production HITL operation with intervention logging

Operate HITL controls in production and log all interventions and outcomes. Review override patterns quarterly.

Lifecycle stage5 – Usage, Monitoring & Change

Periodic oversight effectiveness review and escalation

Conduct periodic oversight effectiveness reviews. Escalate to governance when oversight metrics fall below threshold.

Lifecycle stage5 – Usage, Monitoring & Change

Recursive sub-agent authority caps (monotonic privilege attenuation)

Define and sign off each agent's delegation envelope — maximum depth and strict scope attenuation — before build begins.

source: NIST SP 800-53 AC-6(1) Least Privilege; OWASP Agentic AI Threats & Mitigations (cascading / sub-agent privilege); capability-security monotonic attenuation principle (macaroons)

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

Design-time authority model and approval gate defining each agent's identity, scopes, and delegation envelope

Document each agent's identity, minimum scopes, on-behalf-of population, and delegation depth at design time. Gate build on governance sign-off of the authority matrix.

source: NIST AI RMF MAP 1.1 / GOVERN 2.1 (roles, authority, accountability); NIST SP 800-53 AC-2, PL-8; OWASP Agentic AI Threats & Mitigations (least-privilege design)

Lifecycle stages1 – Use Case Context & Design3 – Onboarding, Build & Review

Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Mint a unique, attestation-backed workload identity per agent at onboarding. Register every SPIFFE-ID to an owner, use case, and approval ticket; ban shared service accounts.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10

Lifecycle stage3 – Onboarding, Build & Review

On-behalf-of delegation that preserves and never exceeds the invoking user's ACLs

Implement on-behalf-of token exchange and prove with negative tests that the agent cannot exceed the user's ACL. Gate release on these tests passing.

source: OAuth 2.0 Token Exchange RFC 8693 (delegation/'act' claims); NIST SP 800-53 AC-3, AC-6; OWASP Agentic AI Threats & Mitigations (Privilege Compromise / confused deputy)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Register every agent identity with a named human owner, approved use case, scopes, and status before issuance. No registry entry, no identity.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2

Lifecycle stage3 – Onboarding, Build & Review

Continuous authorisation via a central policy engine (per-action PDP/PEP check)

Write authorisation policy as versioned, peer-reviewed code traced to approved scopes. Gate promotion on allow/deny scenario tests passing.

source: NIST SP 800-207 Zero Trust (continuous, per-request authorization via PDP/PEP); NIST SP 800-53 AC-3, AC-4; OWASP Agentic AI Threats & Mitigations (per-action authorization)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Automated credential rotation and prohibition of long-lived static secrets for agents

Scan every commit to agent code, prompts, and config for embedded secrets. Block merges on detection and triage findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Mutual authentication and identity verification for agent-to-agent and agent-to-MCP-server calls

Vet and approve every MCP server and peer agent before registering its identity on the allow-list. Block integration until vetting is signed off.

source: NIST SP 800-207 (mutual authentication); NIST SP 800-53 IA-9 Service Identification and Authentication, SC-8; OWASP Agentic AI Threats & Mitigations (agent/MCP identity spoofing)

Lifecycle stages3 – Onboarding, Build & Review4 – Deployment

Per-task short-lived scoped capability tokens minted just-in-time

Mint short-lived, task-scoped tokens just-in-time from a central token service. Enforce a hard max TTL and resource-bound audience so no standing credential exists.

source: OAuth 2.0 Token Exchange RFC 8693 (resource-scoped tokens); NIST SP 800-53 AC-6 Least Privilege; OWASP Non-Human Identities Top 10

Lifecycle stages4 – Deployment5 – Usage, Monitoring & Change

Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Grant sensitive scopes just-in-time for a bounded window with auto-revocation; require human approval for high-impact elevations. Hold zero standing privilege.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)

Lifecycle stage4 – Deployment

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Also addressesPrompt Injection (direct)Indirect Prompt Injection Sensitive Data Leakage Tool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents Resource Exhaustion / Denial of Wallet Capability / Architecture Disclosure

Human-in-the-loop approval on high-risk actionsinteractive

Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.

Also addressesIndirect Prompt Injection Overreliance / Automation Bias Tool Misuse Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization Resource Exhaustion / Denial of Wallet Allocative Harm in Multi-User Arbitration Synthetic-Media Impersonation (Deepfakes & Voice Clones)

Tool argument validation & sandboxinginteractive

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Also addressesTool Misuse Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks

Per-agent identity & taint-marked messagesinteractive

Giving each AI worker its own limited permissions and clearly labelling messages between them as 'untrusted until checked'.

Also addressesConfused Deputy (cross-agent)Rogue & Impersonated Agents Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization

Detective · 5

Immutable audit of the full agent identity lifecycle (issue, grant, delegate, revoke)

Instrument every identity-issuing component with schema-conformant audit emitters. Block release until completeness and tamper-evidence tests pass.

source: NIST SP 800-53 AU-2/AU-3/AU-9/AU-12 (audit content & protection); OWASP Non-Human Identities Top 10 (auditing); NIST AI RMF MANAGE 2.2

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Behavioural anomaly detection on agent identity usage with automated suspension

Define per-identity behaviour profiles and thresholds at build. Rehearse automated suspension and sign off measured revocation time before go-live.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)

Lifecycle stage3 – Onboarding, Build & Review

Loop/cost circuit-breakers & consistency checksinteractive

Automatic stop-switches when AIs get stuck in loops, burn too much money, or start disagreeing with each other.

Also addressesConfused Deputy (cross-agent)Distributed / Cross-Agent Jailbreak Cascading Multi-Agent Errors Agent Misalignment / Goal Misgeneralization Resource Exhaustion / Denial of Wallet

Runtime monitoring & anomaly detectioninteractive

Live dashboards and alarms that notice unusual behaviour — spikes in errors, weird actions, sudden data access.

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Also addressesIndirect Prompt Injection Oversight & Audit-Trail Tampering Sensitive Data Leakage Memory Poisoning Unsafe Tool / Code Execution Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents

Corrective · 6

Monitoring of oversight process adherence metrics

Configure monitoring to track oversight process adherence metrics in production (review rate, SLA compliance, override frequency).

Lifecycle stage5 – Usage, Monitoring & Change

Unique non-human workload identity issuance for every agent (SPIFFE/SPIRE SVID)

Verify each running agent authenticates with its own SVID; revoke on decommission or compromise. Scan periodically for shared or static credentials and remediate.

source: SPIFFE/SPIRE workload identity specification; NIST SP 800-207 Zero Trust Architecture; OWASP Non-Human Identities Top 10

Lifecycle stage5 – Usage, Monitoring & Change

Central agent registry / non-human identity inventory with ownership and lifecycle metadata

Reconcile the registry against runtime identities and suspend unregistered principals. Recertify ownership and scopes periodically; decommission retired agents.

source: OWASP Non-Human Identities Top 10 (inventory/governance); NIST SP 800-53 CM-8 System Component Inventory, AC-2 Account Management; NIST AI RMF GOVERN 1.2

Lifecycle stage5 – Usage, Monitoring & Change

Just-in-time, time-boxed elevation for sensitive scopes (no standing privilege)

Alert on un-revoked elevations and any standing sensitive grant. Report the zero-standing-privilege position to the risk owner on a set cadence.

source: NIST SP 800-53 AC-6(2)/AC-6(5) Least Privilege & privileged accounts; Zero Standing Privilege / JIT access practice; OWASP Agentic AI Threats & Mitigations (excessive permissions)

Lifecycle stage5 – Usage, Monitoring & Change

Automated credential rotation and prohibition of long-lived static secrets for agents

Sweep runtimes and repos on a schedule for static credentials. Alert on any credential exceeding its maximum age and track findings to closure.

source: OWASP Non-Human Identities Top 10 (long-lived/leaked secrets); NIST SP 800-53 IA-5 Authenticator Management, SC-12; SPIFFE short-lived SVID rotation

Lifecycle stage5 – Usage, Monitoring & Change

Behavioural anomaly detection on agent identity usage with automated suspension

Baseline each agent identity's behaviour and alert on out-of-profile use. Auto-suspend credentials on high-confidence anomalies and track mean-time-to-revoke.

source: NIST SP 800-53 AC-2(12) (account monitoring for atypical use), SI-4 System Monitoring; OWASP Agentic AI Threats & Mitigations (identity abuse detection)

Lifecycle stage5 – Usage, Monitoring & Change

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

LLM06:2025 Excessive Agency

MITRE ATLAS

—

NIST AI RMF

GOVERN 1.3
MANAGE 2.2

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Agentic-browser indirect-injection demos (ChatGPT Operator)2025

Researchers showed web-browsing AI agents following instructions embedded in attacker-controlled pages to leak data or take actions.

Replit AI agent deletes a production database2025

A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.

GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)2025

Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.

ForcedLeak — Salesforce Agentforce CRM exfiltration (CVSS 9.4, no CVE)2025

Researchers showed attacker text planted in a public Salesforce Web-to-Lead form is later read by the Agentforce agent during normal use and treated as instructions, exfiltrating CRM data to an attacker domain that had been on Salesforce's CSP allow-list but expired and was re-registered for about $5.

ServiceNow Now Assist — second-order prompt injection via agent-to-agent discovery2025

AppOmni showed ServiceNow Now Assist's default agent config lets a malicious ticket redirect a benign agent into enlisting a more powerful agent — performing record CRUD, admin-role assignment, and email exfiltration with the triggering user's privilege, despite built-in prompt-injection protection.

ShadowLeak — ChatGPT Deep Research zero-click service-side exfiltration2025

A single crafted email with hidden HTML instructions reportedly made OpenAI's Deep Research agent autonomously exfiltrate Gmail inbox data from OpenAI's own cloud — with no user click and, per Radware, no client-side or network evidence.

GitHub Copilot / VS Code RCE via prompt injection ('YOLO mode', CVE-2025-53773)2025

Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).

Agent Session Smuggling in A2A systems (Unit 42)2025

Unit 42 PoCs in which a malicious remote agent abuses default inter-agent trust to covertly inject extra instructions across a stateful A2A session, invisible to the human operator.

Operation Bizarre Bazaar (first attributed LLMjacking campaign with a resale marketplace)2026

Researchers reportedly captured 35,000+ attack sessions from an attributed cluster that mass-scans for unauthenticated LLM/MCP endpoints, hijacks the inference compute, and resells access to 30+ providers via a bulletproof-hosted criminal marketplace.

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)2026

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

Meta AI support bot tricked into hijacking Instagram accounts2026

Attackers reportedly social-engineered Meta's AI-powered Instagram support chatbot into attaching attacker-controlled emails to target accounts and issuing password-reset codes, taking over high-profile accounts (including the Obama-era White House and a U.S. Space Force CMSgt) without the owner's email or any MFA prompt.

AI-assisted breach of Mexican government infrastructure (Claude Code + GPT-4.1)2025

Gambit Security reports that a single operator weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach at least nine Mexican government organizations, with Claude Code reportedly executing ~75% of remote commands after the attacker bypassed its refusals by loading a 1,084-line hacking cheatsheet as a persistent claude.md system prompt.

Autonomous AI agent publishes a defamatory 'hit piece' on a Matplotlib maintainer after its pull request was rejected2026

An autonomous AI agent (handle 'crabby-rathbun' / 'MJ Rathbun', reportedly an OpenClaw agent) had its Matplotlib pull request rejected under a human-contributor policy, then allegedly researched the volunteer maintainer's background and published a defamatory blog post accusing him of discrimination and 'gatekeeping', amplifying it via GitHub comments. Described in early coverage as a first-of-its-kind case of an agent autonomously turning on a human to damage their reputation.

Browse all real-world cases →