Case study

Malicious models on Hugging Face (pickle deserialization RCE)

Disclosed vulnerability27 Feb 2024🗺️ Model / Package Supply Chain

Researchers repeatedly found models on public hubs containing code that executes on load via unsafe pickle deserialization.

Root cause — why it happened

Models are big files of numbers, but the popular way to save them — Python's `pickle` format — can also store instructions to RUN, and those instructions execute the instant you open the file. So a model is not just data; opening it can be like running a program a stranger wrote. Attackers uploaded models to a public hub that look perfectly normal but quietly run hidden code the moment you load them — for example, opening a connection back to the attacker's computer. Some were even built to slip past the hub's automatic safety scanner. The fix the whole field moved toward is a 'data-only' format (safetensors) that can store the numbers but cannot run code.

Risks this case illustrates

Supply-Chain Compromise

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

An attacker builds a model that runs code on load

The attacker takes a normal-looking model and saves it in the format that can also store instructions to RUN. They tuck in a small piece of code — for example, 'when this file is opened, connect back to my computer.' To anyone browsing the hub it looks like just another model.

💻Malicious model artifact (illustrative)code

# pytorch_model.bin — pickle-backed (ILLUSTRATIVE, not operational)
# A reduce-hook makes LOAD == RUN:
class _Payload:
    def __reduce__(self):
        # runs the instant the file is deserialized
        return (os.system, ("<connect-back to attacker host>",))

# ...followed by ordinary-looking tensor data so the file 'works' as a model.
# Reportedly seen: a reverse shell to a hard-coded host on load (JFrog).

Step 1 / 6

Controls & guardrails — what would have stopped it

The cleanest fix is to use a model format that simply CANNOT run code (safetensors): then 'loading' is just reading numbers, and there is nothing to execute. Backing that up: only load models you can prove came from who you think (signatures), pin the exact file you reviewed, and open unfamiliar models inside a locked-down sandbox with no internet, so even a booby-trapped file can't phone home. The honest catch: a scanner badge alone won't save you — the 'nullifAI' samples were built to fool the scanner, so the format choice and the sandbox are what actually hold.

Preventive

Weight provenance, hashing & pre-deploy evals
addressesSupply-Chain Compromise
Hashes prove the file is unchanged, not that it's safe — a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.
Serving-stack & provisioning attestation, cache isolation
addressesSupply-Chain Compromise
Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign — an insider with signing rights still needs review and trigger-focused evals.
Egress allowlisting & DLP on tool arguments
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

Detective

Behavioural evals & regression gating
addressesSupply-Chain Compromise
Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.
Runtime monitoring & anomaly detection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

Corrective

Governance: risk assessment, red-teaming & incident response
addressesSupply-Chain Compromise
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Supply-Chain Compromise →

Lessons

▸ Loading an untrusted model is equivalent to running untrusted code: with pickle-based formats, deserialization executes arbitrary code BEFORE any inference, so the compromise lands at load time, not at use.
▸ Format choice is the real boundary: safetensors is a data-only format that cannot encode executable code, eliminating RCE-on-load rather than merely scanning for it.
▸ A scanner verdict is not a guarantee — the reported 'nullifAI' technique made malformed/packed pickle streams fail open, producing a 'clean' badge for a file that still executed in the consumer's loader.
▸ Trust models by verified provenance and pinned digest, not by name or download count; and load anything unfamiliar inside a least-privilege, egress-denied sandbox so a malicious artifact has nowhere to run or phone home.

Sources

JFrog: Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor ↗
ReversingLabs: Malicious ML models discovered on Hugging Face platform (nullifAI) ↗
JFrog — Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor ↗ — Reported pickle-backed models on the hub, incl. a reverse shell to a hard-coded host on load.
ReversingLabs — Malicious ML models discovered on Hugging Face platform (nullifAI) ↗ — The nullifAI technique: malformed/packed pickle streams that evade the hub's pickle scanner.
Hugging Face — safetensors ↗ — Data-only tensor serialization format adopted to avoid pickle's code-execution risk.

Practise the risk class — related scenarios

🏭Poisoning the Agent Factory

Compromise the pipeline that builds agents, and every new worker is born malicious

🔓The Model That Forgot to Say No

A cost-saving open-weights swap quietly ships a model with its safety surgically removed

💤The Sleeper

A capable third-party model that behaves perfectly — until it sees the trigger

🔌The Tool With a Hidden Agenda

A trusted MCP email tool quietly BCCs every message to an attacker