Definition
Organisation has limited control or oversight over the development, modification and decision-making process for Gen AI models/services from third-party providers.
Interactive deep-dive
This risk has an interactive treatment with technical detail, attack surface, detection signals, and scenarios.
โ Suggested sub-risks โ not yet in your taxonomy
Granular vectors recommended under this risk.
A shared LLM gateway/proxy or aggregation layer concentrates many providers' API keys โ often alongside cloud and cluster credentials โ in one process/host environment, so a single compromise of that layer (a poisoned dependency, an exposed endpoint, or a leaked image) exposes the organisation's entire AI key estate and adjacent secrets at once.
Controls & guardrails that address this
134 proposedGrouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Define third-party AI accountability requirements before vendor engagement. Embed in RFP and contract specifications.
Conduct AI governance due diligence on third-party providers at selection stage. Reject providers failing minimum maturity.
Require third-party providers to submit model cards, validation reports, and security documentation before integration.
Enforce ongoing third-party accountability obligations including incident notification and periodic performance reporting.
Conduct independent performance and compliance monitoring of third-party AI components. Escalate when SLA or compliance obligations are missed.
Allocate every control in a shared-responsibility matrix and flow down regulatory obligations in contract at onboarding. Gate approval on initial assurance artefacts.
source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligationsTreat the model-serving runtime (Triton, vLLM, TGI, Ray Serve, etc.) as managed, attested, version-pinned inventory subject to a patch SLA; require the inference endpoint to be authenticated and network-segmented (never unauthenticated on an untrusted segment); and least-privilege the serving host's identity and egress so a runtime RCE cannot trivially exfiltrate models or pivot. Closes the gap that artifact-provenance controls leave open: integrity of the *data plane that runs the model*, not just of the model artifact.
source: Case study: nvidia-triton-rce-chain (Wiz Research, CVE-2025-23319/-23320/-23334)Third-party developer tools (IDE plugins, MCP servers) must not store or transmit long-lived provider API keys. Issue short-lived, scoped, revocable tokens via a broker/OAuth flow, and gate any first-time outbound transmission of secret-shaped data behind an explicit consent prompt โ so a trojanized tool has no long-lived credential to exfiltrate and any attempt is visible.
source: Case study: jetbrains-marketplace-ai-keystealer-pluginsTreat each third-party AI integration as a privileged non-human principal: issue least-scope, IP/device-bound, short-lived grants (avoid 'full' scope and standing long-lived refresh tokens), instrument the integration's data egress for volume/object-breadth/destination anomalies, and maintain a tested one-move revocation path for all of an integration's tokens so a single vendor-side compromise cannot fan out into hundreds of standing footholds.
source: Proposed from case salesloft-drift-oauth-supply-chain (UNC6395). Grounded in GTIG remediation guidance โ restrict Connected App scopes (no 'full'), enforce IP restrictions, treat all Drift-connected tokens as compromised: https://cloud.google.com/blog/topics/threat-intelligence/data-theft-salesforce-instances-via-salesloft-driftDo not store long-lived multi-provider LLM keys (or ambient cloud/K8s credentials) in the gateway/proxy's plaintext process environment. Issue short-lived, scoped tokens from a secret broker at request time, isolate the serving stack from host cloud/cluster credentials, and monitor per-provider spend and egress so a stolen key surfaces as anomalous usage โ capping the loot a compromised gateway dependency can harvest.
source: Case study: teampcp-litellm-pypi-gateway-compromiseBuild and baseline the golden-set suite against the vendor model before go-live. Sign off thresholds with the model risk owner as a release condition.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (monitoring changed model components); MITRE ATLAS AML.M0015 (Adversarial Input Detection / validation); NIST AI RMF MEASURE 2.6 / MANAGE 4.1Re-verify hashes and signatures on every vendor model update before promotion. Reconcile deployed artifacts against the AIBOM on a set cadence.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)Design all vendor model access behind a gateway with pinned versions, a second-vendor fallback, and a documented exit plan. Gate architecture sign-off on no single-sourcing.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain (maintain supported model versions); NIST AI RMF GOVERN 6.1 (third-party resilience, contingency); established AI-gateway fallback practiceVerify every third-party model artifact against its AIBOM hashes and signatures before load. Fail the build on any unverified artifact.
source: OWASP Top 10 for LLM Apps LLM03:2025 Supply Chain; MITRE ATLAS AML.M0013 (Code Signing), AML.M0014 (Verify ML Artifacts); NIST SP 800-53 SR-4 / SR-11 (provenance, component authenticity)Review independent vendor assurance on cadence, log gaps, and track remediation. Keep the shared-responsibility matrix current so every control has an owner.
source: NIST AI RMF GOVERN 6.1 / GOVERN 6.2 (third-party risk and assurance); NIST SP 800-53 SR-6 Supplier Assessments and Reviews, SA-9 External System Services; EU AI Act GPAI provider obligationsReal-world cases
25Actual published events that illustrate this risk โ click through for the writeup and sources.
A surgically edited open model uploaded to a public hub spread targeted misinformation while passing normal benchmarks.
A malicious MCP server package was found silently BCC-ing every email it sent to an attacker-controlled address โ real supply-chain tool poisoning.
Researchers repeatedly found models on public hubs containing code that executes on load via unsafe pickle deserialization.
Anthropic, the UK AI Security Institute and the Alan Turing Institute report that a near-constant number of poisoned documents (~250 in their experiments) reliably installs a backdoor in models from 600M to 13B parameters โ suggesting poisoning cost may be a roughly fixed absolute count rather than a percentage of training data. The authors stress the demonstrated backdoor is narrow (a denial-of-service trigger) and likely not a frontier-model risk on its own.
Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone โ so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names โ a tactic dubbed 'slopsquatting'.
OX Security enrolled a malicious MCP server into 9 of 11 public registries with no real validation, then confirmed command execution on six live production platforms that discover servers from those registries.
Attackers flooded ClawHub โ the skill marketplace for the popular OpenClaw AI agent โ with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.
A research paper (CAIS 2026 best-paper) shows adversaries can plant hidden, trigger-activated backdoors in AI agents by poisoning the data/environment used to build them โ including a novel 'environment poisoning' vector โ making an agent leak confidential data >80% of the time when triggered, past common guardrails.
Heretic automates 'abliteration' โ removing an open model's safety refusals by orthogonalizing the refusal direction out of its weights, with an Optuna search that preserves capability โ and has produced 4000+ uncensored models on Hugging Face.
Attackers stole OAuth tokens from the Salesloft Drift AI chat integration and used them to silently export Salesforce data from 700+ organisations, reportedly including Cloudflare, Google, Palo Alto Networks and Zscaler.
An attacker got a malicious pull request merged into the open-source aws-toolkit-vscode repo, embedding a destructive prompt that told the Amazon Q agent to wipe local files and AWS resources; the tainted build (v1.84.0) reached the Marketplace's ~1M installs before removal.
Wiz Research chained three flaws in NVIDIA Triton's Python-backend shared-memory IPC โ an information leak of the backend's private shared-memory region name (CVE-2025-23320), a missing ownership/validation check that lets that region be re-registered as attacker-controlled memory, and an out-of-bounds write that corrupts internal data structures (CVE-2025-23319) โ to give a remote, unauthenticated attacker full code execution and takeover of an AI model-serving server, reportedly enabling model theft, response manipulation and lateral movement.
Microsoft's incident-response team found a .NET backdoor that hid its command-and-control channel inside a legitimate OpenAI Assistants API account, fetching encrypted commands stored as Assistant messages โ turning an LLM provider's API into stealth attacker infrastructure.
Google says its Big Sleep agent (DeepMind + Project Zero) discovered SQLite flaw CVE-2025-6965 โ a memory-corruption bug Google states was known only to threat actors and at risk of being exploited โ in what Google calls the first time an AI agent was used to directly foil an in-the-wild exploitation effort.
A benchmark of LLM-agent susceptibility to tool poisoning via malicious tool metadata, built on 45 live MCP servers and 353 real tools; the authors report agents are rarely able to refuse and that more-capable models are often more vulnerable.
Researchers reportedly captured 35,000+ attack sessions from an attributed cluster that mass-scans for unauthenticated LLM/MCP endpoints, hijacks the inference compute, and resells access to 30+ providers via a bulletproof-hosted criminal marketplace.
As part of a multi-ecosystem supply-chain cascade (Trivy onward), TeamPCP used stolen PyPI publishing tokens to ship backdoored BerriAI LiteLLM versions whose auto-running .pth payload harvested cloud, SSH and Kubernetes secrets plus env vars holding OPENAI_API_KEY/ANTHROPIC_API_KEY โ exfiltrating to a typosquatted C2; AI-talent firm Mercor was a downstream victim, with Lapsus$ claiming ~4TB stolen.
Multiple monitoring/critical API endpoints in Langflow (a popular visual AI agent/workflow builder) shipped without authentication, letting unauthenticated attackers read users' conversation and transaction histories and delete message sessions; a public PoC appeared within days and in-the-wild exploitation was reported months later.
Researchers reported at least 15 trojanized JetBrains Marketplace plugins posing as AI coding assistants that silently exfiltrated the OpenAI/DeepSeek/SiliconFlow API keys developers pasted into them โ ~70,000 installs, with stolen keys allegedly resold to paying users.
A trojaned npm package posing as a remote web UI for OpenAI's Codex coding agent silently exfiltrated developers' Codex authentication tokens, enabling persistent account takeover via non-expiring refresh tokens.
Hugging Face's LeRobot robotics-AI framework reportedly exposed its async-inference policy server over an unauthenticated, no-TLS gRPC port that calls Python pickle.loads() on attacker-controlled data, allowing unauthenticated remote code execution on the model-inference host.
A CVSS 10.0 remote-code-execution flaw in Flowise's CustomMCP node lets an attacker run arbitrary JavaScript on the host: the MCP server config is reportedly passed straight to JavaScript's Function() constructor with no validation. Disclosed in Sept 2025 and patched in 3.0.6, it later saw active mass exploitation across thousands of exposed instances in April 2026.
Malicious 'lightning' PyPI releases (reportedly 2.6.2 and 2.6.3) of the widely used PyTorch Lightning ML-training framework ran a credential-stealer on import; an automated scanner flagged them ~18 minutes after publication and maintainers yanked them within ~42 minutes.
Anthropic reports that 'Claude Mythos Preview' โ an unreleased frontier model it describes as able to autonomously find and exploit software flaws โ surfaced more than 10,000 high- or critical-severity vulnerabilities across major operating systems, browsers and open-source projects in roughly its first month under the defensive 'Project Glasswing' program, with Anthropic warning that finding flaws now far outpaces the human capacity to triage and patch them.