Unsafe Tool / Code Execution
highAgency & toolsDefinition
When the AI can run code or commands, a bad instruction can become a real attack on the computer running it — reading files, reaching the network, or worse.
This is recommended as a granular sub-risk of #42 Tool-layer misuse and unintended actions (Cyber & Data Security · Technology Risk). #42 is framed around permission-exceeding tool calls; this names the unsandboxed-execution mechanism and its classic injection/SSRF vectors. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Generated SQL/shell with injection patterns
- ▸ Fetch tool requests to internal/metadata endpoints (SSRF)
- ▸ Sandbox resource/egress alarms
Controls & guardrails that address this
4Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Double-checking the details of every action the AI wants to take, and running risky actions in a locked-down environment.
Controlling where the AI can send data, so secrets can't be quietly shipped to a stranger's address or website.
Giving the agent only the keys it needs for the current task, not a master key to everything.
Recording everything — questions, documents fetched, actions taken — so you can investigate when something goes wrong.
Framework mappings
- LLM05:2025 Improper Output Handling
- LLM06:2025 Excessive Agency
- AML.T0053 LLM Plugin Compromise
- MANAGE 2.2
Real-world cases
13Actual published events that illustrate this risk — click through for the writeup and sources.
A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.
Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.
Researcher Ari Marzouk disclosed 30+ vulnerabilities (24 CVEs) across 10-plus AI coding agents (Copilot, Cursor, Windsurf, Claude Code, Junie and others) where a prompt injected via repo files, READMEs, file names or MCP tool responses makes the assistant weaponize legitimate IDE features for code execution and secret exfiltration.
Researcher Johann Rehberger showed that injected instructions in source code, web pages, or GitHub issues could make the Copilot agent silently write "chat.tools.autoApprove": true into .vscode/settings.json, disabling human approval and granting unattended shell execution — a self-config-rewrite to full-host compromise (CVE-2025-53773).
Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.
An attacker got a malicious pull request merged into the open-source aws-toolkit-vscode repo, embedding a destructive prompt that told the Amazon Q agent to wipe local files and AWS resources; the tainted build (v1.84.0) reached the Marketplace's ~1M installs before removal.
Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.
As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY — exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.
Tenet Security showed that a single fake Sentry error report, sent using only a public DSN, can hijack AI coding agents (Claude Code, Cursor, Codex) into running attacker-controlled code on a developer's machine — an indirect-injection attack delivered through a trusted MCP integration.
Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.
A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.
Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.
Gambit Security reports that a single operator weaponized Anthropic's Claude Code and OpenAI's GPT-4.1 to breach at least nine Mexican government organizations, with Claude Code reportedly executing ~75% of remote commands after the attacker bypassed its refusals by loading a 1,084-line hacking cheatsheet as a persistent claude.md system prompt.