🔍AI RiskAtlas
← Real-world cases
Case study

Model Namespace Reuse (Hugging Face name-trust hijack)

Research demonstration03 Sep 2025🗺️ Model / Package Supply Chain

Unit 42 showed that when a Hugging Face account is deleted (or a model is transferred and the old author later removed), its Author/ModelName namespace can be re-registered by anyone — so platforms and code that resolve models by name auto-deploy attacker-controlled weights, demonstrated as reverse-shell RCE on Google Vertex AI Model Garden and Azure AI Foundry.

Root cause — why it happened

AI apps and platforms often download a model just by its name — like 'TeamX/cool-model' — and trust whatever sits at that name. But on a model hub a name is only borrowed: if the person or team behind it deletes their account (or hands the model over and later leaves), the name can be freed up and grabbed by someone else. An attacker who re-registers a freed, still-trusted name can put their own booby-trapped model there. Now anything that pulls that name keeps working as if nothing changed — except it's quietly downloading the attacker's model, and just loading some model files can run hidden code. Unit 42 showed this could end in a reverse shell running inside Google's and Microsoft's model-hosting services.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

Untrusted supply chainYour infrastructureuploads artefactpull by name / tag🌐Publisher(maybe🏪Model / PackageRegistry🧬Downloadedmodel / package🏗️Your build /serving stack🧠Your deployedmodel🌐Attacker(re-registers🏪Freed namespace(org/model)🏗️Cloudmodel-hosting
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 6

A trusted model is published under a name people rely on

A team publishes a useful model on a public hub under a name like 'TeamX/cool-model'. Other developers, tutorials, and even big cloud platforms start pulling it by that name. Everyone trusts the name — that's the whole point of a public hub.

💻How consumers reference the model (illustrative)code
# Pulled by NAME alone — no revision, no digest, no provenance check
from transformers import AutoModel

model = AutoModel.from_pretrained("TeamX/cool-model")
# resolves to whatever currently lives at that name on the hub
Step 1 / 6

Controls & guardrails — what would have stopped it

The fix that actually breaks the chain: stop trusting names. Pin each model to a specific, unchangeable version (its content fingerprint), not just 'org/model', and verify it came from who you think — then re-registering the name does nothing, because your code is asking for an exact artifact the attacker can't reproduce. Pulling from your own vetted copy instead of the live hub, and loading models in a locked-down sandbox so hidden code can't run, close the gap further. Daily 'has this author been deleted?' scans (like Google added) help, but they don't protect code that still downloads by name alone.

Preventive
  • Weight provenance, hashing & pre-deploy evals

    Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.

  • Serving-stack & provisioning attestation, cache isolation

    Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.

  • MCP/plugin pinning, manifest hashing & re-review

    Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

  • Per-agent identity & taint-marked messages

    Adds coordination overhead and doesn't stop a worker from returning subtly wrong (but well-formed) results that mislead the planner.

Detective
  • Behavioural evals & regression gating

    Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Full-trace audit logging

    Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • Trust the artifact, not the name: a model-hub namespace (org/model) is a re-assignable identifier, so resolving by name alone makes you trust whoever currently owns the name.
  • Names can be freed and re-claimed: a deleted account, a deleted org, or a transfer-then-author-removal can return a namespace to the pool — and an attacker can re-register it under the same trusted path.
  • Pin to an immutable commit/revision (a digest) and verify provenance; mirror models into a controlled store so you never depend on live name resolution.
  • Loading a model can run code: pull-by-name swaps don't just poison data, they can be remote code execution on load — prefer safe formats and sandbox the load.
  • Platform-side scanning (Google's daily deleted-author check) lowers likelihood for managed deployments but doesn't protect the many consumers that still pull by bare name in their own code.
  • Bound the blast radius: even if a malicious model loads, least-privilege and isolation on the hosting container limit a foothold to what that workload can reach.

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗