NVIDIA Triton Inference Server unauthenticated RCE chain (CVE-2025-23319 / -23320 / -23334)
Disclosed vulnerability04 Aug 2025🗺️ Model / Package Supply ChainWiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.
Root cause — why it happened
Triton is software a company runs to serve its AI models to users over the network. Researchers found that if you send it a deliberately oversized request, an error message leaks a secret internal name that should never be visible. Knowing that name, an attacker could trick Triton into handing them control of a private chunk of its own memory, because the server never checked whether the memory really belonged to them. From there a further bug let them scribble outside the lines and run their own code — taking over the whole model-serving machine. The model was never the target; the plumbing that runs it was.
Risks this case illustrates
Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.
How it unfolded
An AI model-serving stack is exposed on the network
A company runs NVIDIA Triton to serve its AI models. Like any web service, it listens for requests over the network. The version it's running has bugs nobody knows about yet, and — as is common — its endpoint is reachable by more of the network than it strictly needs to be.
Controls & guardrails — what would have stopped it
The clean fix is the vendor's patch — so keeping the serving software up to date is the single biggest defence. Beyond that: don't leave the model-serving machine open to the whole network (only let trusted callers reach it), and limit what that machine can touch, so even a captured server can't easily steal models or wander deeper into the network.
- Serving-stack & provisioning attestation, cache isolation
Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.
- Least-privilege identity & scoped credentials
Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.
- Egress allowlisting & DLP on tool arguments
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
- Runtime monitoring & anomaly detectionaddressesSensitive Data Leakage
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
- Full-trace audit logging
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.
- Governance: risk assessment, red-teaming & incident responseaddressesSupply-Chain Compromise
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.
Lessons
- ▸ The AI data plane is attack surface: the server that runs your models can be taken over with no touch on the model, prompt, or training data — it's an application-security target like any other network service.
- ▸ Verbose error messages that leak internal identifiers (CWE-209) are not cosmetic — here a leaked shared-memory region name was the foothold for the entire chain.
- ▸ Internal IPC is a trust boundary: a registration/handle API that doesn't verify ownership lets a known internal name be re-mapped by an attacker.
- ▸ Patch currency on inference servers is a first-class AI control; pair it with network isolation of serving endpoints and least-privilege on the serving host to bound model theft and lateral movement.
- ▸ Vendor severity figures (and the differing CVSS scores across sources) are claims to attribute and reconcile, not independently-confirmed facts.
Proposals & gaps this case surfaced
Non-destructive suggestions for the library — proposed, not adopted.
Treat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.
This case shows a blind spot: most AI-risk lists focus on the model, the prompt, or poisoned downloads — but the server software that actually runs models (the 'inference server') can be hacked directly, like any web service. We should treat the security of AI serving infrastructure as its own risk class, not fold it into general supply chain.
These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.
Sources
- Breaking NVIDIA Triton: CVE-2025-23319 — A Vulnerability Chain Leading to AI Server Takeover — Wiz Research ↗
- Security Bulletin: NVIDIA Triton Inference Server — August 2025 — NVIDIA ↗
- NVD — CVE-2025-23320 Detail (NIST National Vulnerability Database) ↗
- NVIDIA Triton Bugs Let Unauthenticated Attackers Execute Code and Hijack AI Servers — The Hacker News ↗
- Breaking NVIDIA Triton: CVE-2025-23319 — A Vulnerability Chain Leading to AI Server Takeover — Wiz Research ↗ — Original research; the three-stage shared-memory IPC chain.
- Security Bulletin: NVIDIA Triton Inference Server — August 2025 — NVIDIA ↗ — Vendor advisory; affected versions and fix in 25.07; 'not aware of exploitation in the wild'.
- NVD — CVE-2025-23320 Detail ↗ — Info-leak (CWE-209) primitive; CVSS ~7.5.
Practise the risk class — related scenarios
A support email hides instructions — and the assistant obeys them
A text-to-SQL agent runs the model's output straight at the database
A speed optimisation becomes a cross-tenant listening device
Compromise the pipeline that builds agents, and every new worker is born malicious
Two doors to the same secret: reconstruct the model through its API, or just walk off with the weight file
A fake Sentry error report hijacks a developer's coding agent into running a shell command
The forensic record is itself the attack surface — an agent's log is poisoned, then quietly rewritten
A cost-saving open-weights swap quietly ships a model with its safety surgically removed
A screenshot that's harmless at full size becomes an order once the system shrinks it
A capable third-party model that behaves perfectly — until it sees the trigger
An attacker captures the agent's bearer token — and inherits its authority
A trusted MCP email tool quietly BCCs every message to an attacker
A forged peer registers on the agent directory — and the planner enlists it
An inbox summary quietly ships a secret to an attacker's server