Case study

NVIDIA Triton Inference Server unauthenticated RCE chain (CVE-2025-23319 / -23320 / -23334)

Disclosed vulnerability04 Aug 2025🗺️ Model / Package Supply Chain

Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.

Root cause — why it happened

Triton is software a company runs to serve its AI models to users over the network. Researchers found that if you send it a deliberately oversized request, an error message leaks a secret internal name that should never be visible. Knowing that name, an attacker could trick Triton into handing them control of a private chunk of its own memory, because the server never checked whether the memory really belonged to them. From there a further bug let them scribble outside the lines and run their own code — taking over the whole model-serving machine. The model was never the target; the plumbing that runs it was.

Risks this case illustrates

Supply-Chain Compromise Unsafe Tool / Code Execution Sensitive Data Leakage

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

An AI model-serving stack is exposed on the network

A company runs NVIDIA Triton to serve its AI models. Like any web service, it listens for requests over the network. The version it's running has bugs nobody knows about yet, and — as is common — its endpoint is reachable by more of the network than it strictly needs to be.

Step 1 / 6

Controls & guardrails — what would have stopped it

The clean fix is the vendor's patch — so keeping the serving software up to date is the single biggest defence. Beyond that: don't leave the model-serving machine open to the whole network (only let trusted callers reach it), and limit what that machine can touch, so even a captured server can't easily steal models or wander deeper into the network.

Preventive

Serving-stack & provisioning attestation, cache isolation
addressesSupply-Chain Compromise Sensitive Data Leakage
Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.
Least-privilege identity & scoped credentials
addressesUnsafe Tool / Code Execution Sensitive Data Leakage
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
Egress allowlisting & DLP on tool arguments
addressesUnsafe Tool / Code Execution Sensitive Data Leakage
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Detective

Runtime monitoring & anomaly detection
addressesSensitive Data Leakage
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
Full-trace audit logging
addressesUnsafe Tool / Code Execution Sensitive Data Leakage
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective

Governance: risk assessment, red-teaming & incident response
addressesSupply-Chain Compromise
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Supply-Chain Compromise →All guardrails for Unsafe Tool / Code Execution →All guardrails for Sensitive Data Leakage →

Lessons

▸ The AI data plane is attack surface: the server that runs your models can be taken over with no touch on the model, prompt, or training data — it's an application-security target like any other network service.
▸ Verbose error messages that leak internal identifiers (CWE-209) are not cosmetic — here a leaked shared-memory region name was the foothold for the entire chain.
▸ Internal IPC is a trust boundary: a registration/handle API that doesn't verify ownership lets a known internal name be re-mapped by an attacker.
▸ Patch currency on inference servers is a first-class AI control; pair it with network isolation of serving endpoints and least-privilege on the serving host to bound model theft and lateral movement.
▸ Vendor severity figures (and the differing CVSS scores across sources) are claims to attribute and reconcile, not independently-confirmed facts.

Proposals & gaps this case surfaced

Non-destructive suggestions for the library — proposed, not adopted.

✚ proposed guardrailPatch-currency, network isolation & attested version inventory for AI inference-serving runtimesInfrastructure & Runtime Hardening

Treat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.

coverage gapSupply-Chain Compromise →

This case shows a blind spot: most AI-risk lists focus on the model, the prompt, or poisoned downloads — but the server software that actually runs models (the 'inference server') can be hacked directly, like any web service. We should treat the security of AI serving infrastructure as its own risk class, not fold it into general supply chain.

These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.

Sources

Breaking NVIDIA Triton: CVE-2025-23319 — A Vulnerability Chain Leading to AI Server Takeover — Wiz Research ↗
Security Bulletin: NVIDIA Triton Inference Server — August 2025 — NVIDIA ↗
NVD — CVE-2025-23320 Detail (NIST National Vulnerability Database) ↗
NVIDIA Triton Bugs Let Unauthenticated Attackers Execute Code and Hijack AI Servers — The Hacker News ↗
Breaking NVIDIA Triton: CVE-2025-23319 — A Vulnerability Chain Leading to AI Server Takeover — Wiz Research ↗ — Original research; the three-stage shared-memory IPC chain.
Security Bulletin: NVIDIA Triton Inference Server — August 2025 — NVIDIA ↗ — Vendor advisory; affected versions and fix in 25.07; 'not aware of exploitation in the wild'.
NVD — CVE-2025-23320 Detail ↗ — Info-leak (CWE-209) primitive; CVSS ~7.5.

Practise the risk class — related scenarios

📧The Email That Gave Orders

A support email hides instructions — and the assistant obeys them

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

👂Overheard Through the Cache

A speed optimisation becomes a cross-tenant listening device

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🪟Stealing the Model

Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

📼The Compromised Flight Recorder

The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten

🔓The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

🖼️The Picture That Whispered

A screenshot that's harmless at full size becomes an order once the system shrinks it

💤The Sleeper

A capable third-party model that behaves perfectly — until it sees the trigger

🎫The Stolen Session

An attacker captures the agent's bearer token — and inherits its authority

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker

🥸The Uninvited Agent

A forged peer registers on the agent directory — and the planner enlists it

🖼️Zero-Click Leak by Picture

An inbox summary quietly ships a secret to an attacker's server