Malicious models on Hugging Face (pickle deserialization RCE)
Disclosed vulnerability27 Feb 2024πΊοΈ Model / Package Supply ChainResearchers repeatedly found models on public hubs containing code that executes on load via unsafe pickle deserialization.
Root cause β why it happened
Models are big files of numbers, but the popular way to save them β Python's `pickle` format β can also store instructions to RUN, and those instructions execute the instant you open the file. So a model is not just data; opening it can be like running a program a stranger wrote. Attackers uploaded models to a public hub that look perfectly normal but quietly run hidden code the moment you load them β for example, opening a connection back to the attacker's computer. Some were even built to slip past the hub's automatic safety scanner. The fix the whole field moved toward is a 'data-only' format (safetensors) that can store the numbers but cannot run code.
Risks this case illustrates
Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.
How it unfolded
An attacker builds a model that runs code on load
The attacker takes a normal-looking model and saves it in the format that can also store instructions to RUN. They tuck in a small piece of code β for example, 'when this file is opened, connect back to my computer.' To anyone browsing the hub it looks like just another model.
# pytorch_model.bin β pickle-backed (ILLUSTRATIVE, not operational)
# A reduce-hook makes LOAD == RUN:
class _Payload:
def __reduce__(self):
# runs the instant the file is deserialized
return (os.system, ("<connect-back to attacker host>",))
# ...followed by ordinary-looking tensor data so the file 'works' as a model.
# Reportedly seen: a reverse shell to a hard-coded host on load (JFrog).Controls & guardrails β what would have stopped it
The cleanest fix is to use a model format that simply CANNOT run code (safetensors): then 'loading' is just reading numbers, and there is nothing to execute. Backing that up: only load models you can prove came from who you think (signatures), pin the exact file you reviewed, and open unfamiliar models inside a locked-down sandbox with no internet, so even a booby-trapped file can't phone home. The honest catch: a scanner badge alone won't save you β the 'nullifAI' samples were built to fool the scanner, so the format choice and the sandbox are what actually hold.
- Weight provenance, hashing & pre-deploy evalsaddressesSupply-Chain Compromise
Hashes prove the file is unchanged, not that it's safe β a trained-in backdoor or ablated refusal direction passes integrity checks. Only behavioural evals probe disposition, and they can't be exhaustive.
- Serving-stack & provisioning attestation, cache isolationaddressesSupply-Chain Compromise
Attestation is operationally heavy and rarely covers the full stack; cache isolation trades away latency/cost savings, so it's often left on for performance. Signing proves a template wasn't tampered in transit, not that a signed template is benign β an insider with signing rights still needs review and trigger-focused evals.
- Egress allowlisting & DLP on tool arguments
Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.
- Behavioural evals & regression gatingaddressesSupply-Chain Compromise
Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.
- Runtime monitoring & anomaly detection
Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.
- Governance: risk assessment, red-teaming & incident responseaddressesSupply-Chain Compromise
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.
Lessons
- βΈ Loading an untrusted model is equivalent to running untrusted code: with pickle-based formats, deserialization executes arbitrary code BEFORE any inference, so the compromise lands at load time, not at use.
- βΈ Format choice is the real boundary: safetensors is a data-only format that cannot encode executable code, eliminating RCE-on-load rather than merely scanning for it.
- βΈ A scanner verdict is not a guarantee β the reported 'nullifAI' technique made malformed/packed pickle streams fail open, producing a 'clean' badge for a file that still executed in the consumer's loader.
- βΈ Trust models by verified provenance and pinned digest, not by name or download count; and load anything unfamiliar inside a least-privilege, egress-denied sandbox so a malicious artifact has nowhere to run or phone home.
Sources
- JFrog: Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor β
- ReversingLabs: Malicious ML models discovered on Hugging Face platform (nullifAI) β
- JFrog β Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor β β Reported pickle-backed models on the hub, incl. a reverse shell to a hard-coded host on load.
- ReversingLabs β Malicious ML models discovered on Hugging Face platform (nullifAI) β β The nullifAI technique: malformed/packed pickle streams that evade the hub's pickle scanner.
- Hugging Face β safetensors β β Data-only tensor serialization format adopted to avoid pickle's code-execution risk.
Practise the risk class β related scenarios
Compromise the pipeline that builds agents, and every new worker is born malicious
A cost-saving open-weights swap quietly ships a model with its safety surgically removed
A capable third-party model that behaves perfectly β until it sees the trigger
A trusted MCP email tool quietly BCCs every message to an attacker