πŸ”AI RiskAtlas
← Real-world cases
Case study

Arup HK$200M deepfake video-call CFO fraud

Real-world incident04 Feb 2024πŸ—ΊοΈ Identity Deepfake (Face Swap & Talking Head)

A finance employee at engineering firm Arup's Hong Kong office paid out about HK$200M (~US$25.6M) in 15 transfers after a video conference in which the CFO and other 'colleagues' were all AI-generated deepfakes of real staff (face and voice).

Root cause β€” why it happened

Nobody hacked Arup's computers. The trick was simpler and scarier: the attackers made a fake version of people the employee trusted. They collected public videos and photos of Arup's chief financial officer and some colleagues β€” the kind of footage you can find online β€” and used AI to recreate their faces and voices. Then they put those fakes on a live video call. The finance worker in Hong Kong saw the 'CFO' and several 'colleagues' on screen, heard them speak, and felt reassured β€” even though the worker had first suspected the email was a scam. Because the worker treated 'I can see and hear my boss' as proof it was really them, they followed the call's instructions and made 15 bank transfers worth about HK$200M (~US$25.6M). The fraud only came to light later during a routine check with head office. The failure was trusting a face and a voice as proof of identity β€” exactly the thing AI can now fake.

Risks this case illustrates

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

Requester (may be an abuser)Scraped third-party identityGeneration pipelineOpen-weights supply chainConsent & content-authenticity controlstarget requestscraped clipπŸ§‘User🌐UntrustedContentπŸŽ›οΈOrchestrator /Agent LoopπŸ›‚Consent /Identity-UseπŸ”¬Synthetic-Media/ DeepfakeπŸ†”Face / IdentityEmbeddingπŸ—£οΈSpeaker /Voice-Clone🎭Face-SwapGeneratorπŸ”ŠAcoustic / TTSModel🎞️Temporal /Motion Module🎚️Audio Decoder /Neural Codec🧬Model Weights &RegistryπŸͺModel / PackageRegistryπŸ—οΈServingInfrastructure🧯OutputGuardrailπŸ”–ContentProvenance &πŸ§‘Fraud operator(drives theπŸ§‘β€βš–οΈLivemulti-partyβœ‹Arup financeemployee🌐5 Hong Kongbank accounts
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect its risks
SetupStep 1 / 7

Attacker scrapes public footage of Arup executives

It starts with homework, not hacking. The fraudsters gather videos and photos of Arup's senior people β€” the chief financial officer and some colleagues β€” from the open web: conference talks, press clips, company videos. Senior staff leave a big public footprint, so there is plenty to work with. The targets never know their likeness is being collected.

πŸ“„Attacker's target list (illustrative reconstruction)document
Targets to impersonate on the call (recreate from public footage):
  - CFO (UK-based)            sources: conference keynote, media interview
  - Finance colleague A       sources: corporate video, webinar
  - Finance colleague B       sources: public panel recording

Goal: populate a live video conference so the Hong Kong
finance employee sees a roomful of familiar, senior faces.
Note: no Arup system access required β€” likeness only.
Step 1 / 7

Controls & guardrails β€” what would have stopped it

One rule breaks this entire scam: never send money just because of a video call. For any large or confidential transfer, the employee should have to hang up and call the person back on a known, trusted number β€” and get a second person to approve it β€” before anything moves. The employee even had the right instinct at first (they suspected phishing); the call talked them out of it. Training staff that faces and voices can now be faked, and that 'I saw them' is no longer proof, would have kept that instinct alive. Watermarking or detecting the fake video does not help, because a real attacker simply won't watermark their own forgery.

Preventive
  • Human-in-the-loop approval on high-risk actions

    Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.

  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

  • User AI-literacy & verification workflows

    Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Detective
  • Runtime monitoring & anomaly detection

    Detects the anomalous, not the novel-but-subtle; high false-positive rates cause alert fatigue. Always a step behind a sufficiently quiet attacker.

  • Full-trace audit logging

    Logging is forensic, not preventive β€” it explains harm after the fact. Useless if no one reviews it or if the materialised context isn't captured.

  • Synthetic-media / deepfake detection

    Probabilistic and in an arms race with generators; evadable (UnMarker-style perturbation, novel models) and prone to false confidence. A triage signal, not proof β€” high-stakes calls still need out-of-band verification.

Corrective
  • Governance: risk assessment, red-teaming & incident response

    Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

Lessons

  • β–Έ Face and voice are no longer identity: cheap, one-shot deepfakes from public footage mean 'I can see and hear them' is forgeable evidence, not proof β€” high-value approvals must verify through a channel the attacker cannot control.
  • β–Έ Multi-party real-time deepfakes manufacture false consensus: impersonating the CFO AND several colleagues defeats the lone-imposter heuristic and overturned the employee's correct initial phishing suspicion.
  • β–Έ This was social engineering, not a breach: no Arup system was compromised; the money left via a normal authorised payment path because the human authoriser was deceived β€” frame the failure as overreliance, not a technical exploit.
  • β–Έ Put the boundary on the payment, not the pixels: an enforced out-of-band callback plus dual control on confidential/high-value transfers is what breaks the chain β€” a single employee acting on a video call should never be able to move ~HK$200M.
  • β–Έ Deepfake-pipeline controls don't protect the victim: the enrolment consent-gate, output classifiers and provenance/watermarks protect the impersonated subjects or label cooperative output, but an adversarial offline pipeline honours none of them.
  • β–Έ Detection that comes only on routine reconciliation is too late: the out-of-band confirmation that exposed the fraud (HQ saying the transaction did not exist) is exactly the check that, done before payment, would have prevented the loss.

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—