Unsafe Tool / Code Execution

highAgency & tools

Definition

When the AI can run code or commands, a bad instruction can become a real attack on the computer running it — reading files, reaching the network, or worse.

★ Suggested sub-risk — not yet in your taxonomyrecommended under #42 Tool-layer misuse and unintended actions

This is recommended as a granular sub-risk of #42 Tool-layer misuse and unintended actions (Cyber & Data Security · Technology Risk). #42 is framed around permission-exceeding tool calls; this names the unsandboxed-execution mechanism and its classic injection/SSRF vectors. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.

Where it attaches

The system components this risk arises at.

🔧 Tool Runtime🗄️ Business Database🔌 External APIs🧰 MCP / Plugin Server🧯 Output Guardrail

Detection signals

▸ Generated SQL/shell with injection patterns
▸ Fetch tool requests to internal/metadata endpoints (SSRF)
▸ Sandbox resource/egress alarms

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 3

Tool argument validation & sandboxinginteractive

Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.

Also addressesExcessive Agency Tool Misuse Tool Poisoning / MCP Description Attacks

Egress allowlisting & DLP on tool argumentsinteractive

Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.

Also addressesIndirect Prompt Injection Sensitive Data Leakage Tool Poisoning / MCP Description Attacks

Least-privilege identity & scoped credentialsinteractive

Giving the agent only the keys it needs for the current task, not a master key to everything.

Also addressesPrompt Injection (direct)Indirect Prompt Injection Sensitive Data Leakage Excessive Agency Tool Misuse Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents Resource Exhaustion / Denial of Wallet Capability / Architecture Disclosure

Detective · 1

Full-trace audit logginginteractive

Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.

Also addressesIndirect Prompt Injection Oversight & Audit-Trail Tampering Sensitive Data Leakage Memory Poisoning Excessive Agency Tool Poisoning / MCP Description Attacks Confused Deputy (cross-agent)Rogue & Impersonated Agents

Open these in the Control Library →

Framework mappings

OWASP LLM Top 10

LLM05:2025 Improper Output Handling
LLM06:2025 Excessive Agency

MITRE ATLAS

AML.T0053 LLM Plugin Compromise

NIST AI RMF

MANAGE 2.2

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

Replit AI agent deletes a production database2025

A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.

GTG-1002 — first reported AI-orchestrated cyber-espionage campaign (Claude Code)2025

Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.

IDEsaster — AI coding IDEs/agents turned into exfiltration & RCE surfaces2025

Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.

GitHub Copilot / VS Code RCE via prompt injection ('YOLO mode', CVE-2025-53773)2025

Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).

Model Namespace Reuse (Hugging Face name-trust hijack)2025

Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.

Amazon Q Developer 'wiper' prompt shipped via poisoned pull request (CVE-2025-8217)2025

An attacker got a malicious pull request merged into the open-source aws-toolkit-vscode repo, embedding a destructive prompt that told the Amazon Q agent to wipe local files and AWS resources; the tainted build (v1.84.0) reached the Marketplace's ~1M installs before removal.

NVIDIA Triton Inference Server unauthenticated RCE chain (CVE-2025-23319 / -23320 / -23334)2025

Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.

TeamPCP poisons the LiteLLM AI gateway on PyPI to harvest LLM API keys2026

As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY — exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.

Agentjacking — hijacking AI coding agents via Sentry error reports (Tenet Security)2026

Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.

LeRobot async-inference gRPC pickle RCE (CVE-2026-25874)2026

Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.

Flowise AI agent builder CustomMCP RCE (CVE-2025-59528)2025

A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.

Project Glasswing — Claude 'Mythos' autonomously finds 10,000+ software vulnerabilities2026

Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.

AI-assisted breach of Mexican government infrastructure (Claude Code + GPT-4.1)2025

Gambit Security reports that a single operator weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach at least nine Mexican government organizations, with Claude Code reportedly executing ~75% of remote commands after the attacker bypassed its refusals by loading a 1,084-line hacking cheatsheet as a persistent claude.md system prompt.

Browse all real-world cases →

Practise this in an interactive scenario

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command