Supply-Chain Compromise
highInfrastructure & internalsDefinition
The AI is built from parts made by others — models, libraries, tool packs, datasets. If any of those is tampered with before you get it, your system inherits the problem.
This is recommended as a granular sub-risk of #8 Lack of third-party accountability (Accountability & Governance · Operational Risk). The TeamPCP/LiteLLM compromise showed the gateway layer is disproportionately valuable collateral precisely because it co-locates OPENAI_API_KEY/ANTHROPIC_API_KEY with AWS/GCP/Azure and Kubernetes credentials. The existing supply-chain risk covers the poisoning vector but not the architectural concentration that turns one dependency compromise into total key-estate loss; this sub-risk surfaces that secrets-architecture dimension under third-party/supply-chain accountability. Your 44-row Enterprise Risk Mapping is unchanged — this is a suggestion for inclusion.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ Models from unverified hubs / no provenance
- ▸ Unsafe (pickle) serialization in model artifacts
- ▸ Dependency or package anomalies in scans
- ▸ Behavioural eval regressions after a dependency bump
Controls & guardrails that address this
184 proposedGrouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define third-party AI accountability requirements before vendor engagement. Embed in RFP and contract specifications.
Conduct AI governance due diligence on third-party providers at selection stage. Reject providers failing minimum maturity.
Require third-party providers to submit model cards, validation reports, and security documentation before integration.
Enforce ongoing third-party accountability obligations including incident notification and periodic performance reporting.
Conduct independent performance and compliance monitoring of third-party AI components. Escalate when SLA or compliance obligations are missed.
Allocate every control in a shared-responsibility matrix and flow down regulatory obligations in contract at onboarding. Gate approval on initial assurance artefacts.
source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligationsTreat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.
source: Case study: nvidia-triton-rce-chain (Wiz Research, CVE-2025-23319/-23320/-23334)Third-party developer tools (IDE plugins, MCP servers) must not store or transmit long-lived provider API keys. Issue short-lived, scoped, revocable tokens via a broker/OAuth flow, and gate any first-time outbound transmission of secret-shaped data behind an explicit consent prompt — so a trojanized tool has no long-lived credential to exfiltrate and any attempt is visible.
source: Case study: jetbrains-marketplace-ai-keystealer-pluginsTreat each third-party AI integration as a privileged non-human principal: issue least-scope, IP/device-bound, short-lived grants (avoid 'full' scope and standing long-lived refresh tokens), instrument the integration's data egress for volume/object-breadth/destination anomalies, and maintain a tested one-move revocation path for all of an integration's tokens so a single vendor-side compromise cannot fan out into hundreds of standing footholds.
source: Proposed from case salesloft-drift-oauth-supply-chain (UNC6395). Grounded in GTIG remediation guidance — restrict Connected App scopes (no 'full'), enforce IP restrictions, treat all Drift-connected tokens as compromised: https://cloud.google.com/blog/topics/threat-intelligence/data-theft-salesforce-instances-via-salesloft-driftDo not store long-lived multi-provider LLM keys (or ambient cloud/K8s credentials) in the gateway/proxy's plaintext process environment. Issue short-lived, scoped tokens from a secret broker at request time, isolate the serving stack from host cloud/cluster credentials, and monitor per-provider spend and egress so a stolen key surfaces as anomalous usage — capping the loot a compromised gateway dependency can harvest.
source: Case study: teampcp-litellm-pypi-gateway-compromiseKnowing exactly where the model came from, checking it hasn't been swapped, and testing its behaviour before going live.
Treating add-on tool packs like software you vet: locking to a reviewed version and re-checking whenever it changes.
Making sure the machinery running the model — and the template used to stamp out new agents — is the real, unmodified version, and that one user's data can't leak into another's through shared shortcuts.
Build and baseline the golden-set suite against the vendor model before go-live. Sign off thresholds with the model risk owner as a release condition.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (monitoring changed model components); MITRE ATLAS AML.M0015 (Adversarial Input Detection / validation); NIST AI RMF MEASURE 2.6 / MANAGE 4.1Re-verify hashes and signatures on every vendor model update before promotion. Reconcile deployed artifacts against the AIBOM on a set cadence.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)Regularly testing the AI against a set of known-good and known-bad examples, and re-testing whenever anything changes.
Design all vendor model access behind a gateway with pinned versions, a second-vendor fallback, and a documented exit plan. Gate architecture sign-off on no single-sourcing.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (maintain supported model versions); NIST AI RMF GOVERN 6.1 (third-party resilience, contingency); established AI-gateway fallback practiceVerify every third-party model artifact against its AIBOM hashes and signatures before load. Fail the build on any unverified artifact.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)Review independent vendor assurance on cadence, log gaps, and track remediation. Keep the shared-responsibility matrix current so every control has an owner.
source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligationsThe organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.
Framework mappings
- LLM03:2025 Supply Chain
- AML.T0010 ML Supply Chain Compromise
- MAP 4.1
- MANAGE 3.1
Real-world cases
25Actual published events that illustrate this risk — click through for the writeup and sources.
A surgically edited open model uploaded to a public hub spread targeted misinformation while passing normal benchmarks.
A malicious MCP server package was found silently BCC-ing every email it sent to an attacker-controlled address — real supply-chain tool poisoning.
Researchers repeatedly found models on public hubs containing code that executes on load via unsafe pickle deserialization.
Anthropic, the UK AI Security Institute and the Alan Turing Institute report that a near-constant number of poisoned documents (~250 in their experiments) reliably installs a backdoor in models from 600M to 13B parameters — suggesting poisoning cost may be a roughly fixed absolute count rather than a percentage of training data. The authors stress the demonstrated backdoor is narrow (a denial-of-service trigger) and likely not a frontier-model risk on its own.
Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.
OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.
Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.
A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them — including a novel 'environment poisoning' vector — making an agent leak confidential data >80% of the time when triggered, past common guardrails.
Heretic automates 'abliteration' — removing an open model's safety refusals by orthogonalizing the refusal direction out of its weights, with an Optuna search that preserves capability — and has produced 4000+ uncensored models on Hugging Face.
Attackers stole OAuth tokens from the Salesloft Drift AI chat integration and used them to silently export Salesforce data from 700+ organisations, reportedly including Cloudflare, Google, Palo Alto Networks and Zscaler.
An attacker got a malicious pull request merged into the open-source aws-toolkit-vscode repo, embedding a destructive prompt that told the Amazon Q agent to wipe local files and AWS resources; the tainted build (v1.84.0) reached the Marketplace's ~1M installs before removal.
Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC — an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) — to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.
Microsoft's incident-response team found a .NET backdoor that hid its command-and-control channel inside a legitimate OpenAI Assistants API account, fetching encrypted commands stored as Assistant messages — turning an LLM provider's API into stealth attacker infrastructure.
Google says its Big Sleep agent (DeepMind + Project Zero) discovered SQLite flaw CVE-2025-6965 — a memory-corruption bug Google states was known only to threat actors and at risk of being exploited — in what Google calls the first time an AI agent was used to directly foil an in-the-wild exploitation effort.
A benchmark of LLM-agent susceptibility to tool poisoning via malicious tool metadata, built on 45 live MCP servers and 353 real tools; the authors report agents are rarely able to refuse and that more-capable models are often more vulnerable.
Researchers reportedly captured 35,000+ attack sessions from an attributed cluster that mass-scans for unauthenticated LLM/MCP endpoints, hijacks the inference compute, and resells access to 30+ providers via a bulletproof-hosted criminal marketplace.
As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY — exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.
Multiple monitoring/critical API endpoints in Langflow (a popular visual AI agent/workflow builder) shipped without authentication, letting unauthenticated attackers read users' conversation and transaction histories and delete message sessions; a public PoC appeared within days and in-the-wild exploitation was reported months later.
Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them — ~70,000 installs, with stolen keys allegedly resold to paying users.
A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.
Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.
A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.
Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.
Anthropic reports that 'Claude Mythos Preview' — an unreleased frontier model it describes as able to autonomously find and exploit software flaws — surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.
Practise this in an interactive scenario
Compromise the pipeline that builds agents, and every new worker is born malicious
A cost-saving open-weights swap quietly ships a model with its safety surgically removed
A capable third-party model that behaves perfectly — until it sees the trigger
A trusted MCP email tool quietly BCCs every message to an attacker