Case study

Model Namespace Reuse (Hugging Face name-trust hijack)

Research demonstration03 Sep 2025🗺️ Model / Package Supply Chain

Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.

Root cause — why it happened

AI apps and platforms often download a model just by its name — like 'TeamX/cool-model' — and trust whatever sits at that name. But on a model hub a name is only borrowed: if the person or team behind it deletes their account (or hands the model over and later leaves), the name can be freed up and grabbed by someone else. An attacker who re-registers a freed, still-trusted name can put their own booby-trapped model there. Now anything that pulls that name keeps working as if nothing changed — except it's quietly downloading the attacker's model, and just loading some model files can run hidden code. Unit 42 showed this could end in a reverse shell running inside Google's and Microsoft's model-hosting services.

Risks this case illustrates

Supply-Chain Compromise Unsafe Tool / Code Execution

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

A trusted model is published under a name people rely on

A team publishes a useful model on a public hub under a name like 'TeamX/cool-model'. Other developers, tutorials, and even big cloud platforms start pulling it by that name. Everyone trusts the name — that's the whole point of a public hub.

💻How consumers reference the model (illustrative)code

# Pulled by NAME alone — no revision, no digest, no provenance check
from transformers import AutoModel

model = AutoModel.from_pretrained("TeamX/cool-model")
# resolves to whatever currently lives at that name on the hub

Step 1 / 6

Controls & guardrails — what would have stopped it

The fix that actually breaks the chain: stop trusting names. Pin each model to a specific, unchangeable version (its content fingerprint), not just 'org/model', and verify it came from who you think — then re-registering the name does nothing, because your code is asking for an exact artifact the attacker can't reproduce. Pulling from your own vetted copy instead of the live hub, and loading models in a locked-down sandbox so hidden code can't run, close the gap further. Daily 'has this author been deleted?' scans (like Google added) help, but they don't protect code that still downloads by name alone.

Preventive

Weight provenance, hashing & pre-deploy evals
addressesSupply-Chain Compromise
Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.
Serving-stack & provisioning attestation, cache isolation
addressesSupply-Chain Compromise
Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.
MCP/plugin pinning, manifest hashing & re-review
addressesSupply-Chain Compromise
Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.
Per-agent identity & taint-marked messages
Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Detective

Behavioural evals & regression gating
addressesSupply-Chain Compromise
Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.
Runtime monitoring & anomaly detection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
Full-trace audit logging
addressesUnsafe Tool / Code Execution
Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective

Governance: risk assessment, red-teaming & incident response
addressesSupply-Chain Compromise
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Supply-Chain Compromise →All guardrails for Unsafe Tool / Code Execution →

Lessons

▸ Trust the artifact, not the name: a model-hub namespace (org/model) is a re-assignable identifier, so resolving by name alone makes you trust whoever currently owns the name.
▸ Names can be freed and re-claimed: a deleted account, a deleted org, or a transfer-then-author-removal can return a namespace to the pool — and an attacker can re-register it under the same trusted path.
▸ Pin to an immutable commit/revision (a digest) and verify provenance; mirror models into a controlled store so you never depend on live name resolution.
▸ Loading a model can run code: pull-by-name swaps don't just poison data, they can be remote code execution on load — prefer safe formats and sandbox the load.
▸ Platform-side scanning (Google's daily deleted-author check) lowers likelihood for managed deployments but doesn't protect the many consumers that still pull by bare name in their own code.
▸ Bound the blast radius: even if a malicious model loads, least-privilege and isolation on the hosting container limit a foothold to what that workload can reach.

Sources

Model Namespace Reuse: An AI Supply-Chain Attack Exploiting Model Name Trust — Unit 42, Palo Alto Networks (Itay Saraf & Ofir Balassiano, Sep 3 2025) ↗
AI Supply Chain Attack Method Demonstrated Against Google, Microsoft Products — SecurityWeek ↗
Model Namespace Reuse: An AI Supply-Chain Attack Exploiting Model Name Trust — Unit 42, Palo Alto Networks (Itay Saraf & Ofir Balassiano, Sep 3 2025) ↗ — Primary research; RCE on Vertex AI Model Garden & Azure AI Foundry; Google's daily deleted-author scan; pin-to-revision / controlled-store mitigations.
AI Supply Chain Attack Method Demonstrated Against Google, Microsoft Products — SecurityWeek ↗ — Coverage of the Unit 42 disclosure; responsibly-disclosed research, no assigned CVE.

Practise the risk class — related scenarios

🗄️When the Query Bites Back

A text-to-SQL agent runs the model's output straight at the database

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🪤The Bug Report That Ran Code

A fake Sentry error report hijacks a developer's coding agent into running a shell command

🔓The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

💤The Sleeper

A capable third-party model that behaves perfectly — until it sees the trigger

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker