Case study

Mata v. Avianca — fabricated case citations

Real-world incident22 Jun 2023🗺️ Human + AI Professional Workflow

Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.

Root cause — why it happened

A lawyer used ChatGPT to find court cases supporting an argument. It produced confident, official-looking citations — but several of the cases did not exist. The lawyer didn't check whether they were real and filed them in court. The model invented facts; the human trusted them without verifying; and there was no step that required checking before filing.

Risks this case illustrates

Hallucination Overreliance / Automation Bias

Named in the standard (OWASP/ATLAS/NIST) lens. Click a highlighted component in the diagram below to see which risks attach where.

How it unfolded

← / → to step · click a component to inspect

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click a component to inspect its risks

SetupStep 1 / 6

A lawyer asks the AI for supporting cases

Under deadline, a lawyer asks ChatGPT to find court decisions that support their client's position — using it like a legal research assistant.

💬The requestprompt

Find me cases supporting tolling of the statute of limitations in a Montreal Convention claim.

Step 1 / 6

Controls & guardrails — what would have stopped it

One simple rule would have caught it: before filing, check that every case the AI cited actually exists in a real legal database. Treat the AI as a draft-writer, not a source of truth — and make the human's job to verify, not just to forward.

Preventive

Grounding / citation checks
addressesHallucination
Can only check against the evidence retrieved; if the right document wasn't retrieved, a confident wrong answer may still pass. Judges have their own error rate.
Uncertainty signalling & abstention
addressesHallucination Overreliance / Automation Bias
Models are poorly calibrated and often confidently wrong; over-abstention makes the product useless, so the tuning is delicate.
User AI-literacy & verification workflows
addressesHallucination Overreliance / Automation Bias
Relies on human diligence under time pressure; automation bias is strong and training decays. A backstop, not a guarantee.

Detective

Behavioural evals & regression gating
addressesHallucination
Evals only measure what they test; novel behaviours and rare triggers slip through, and a backdoor keyed to an unguessed trigger passes every benchmark.

Corrective

Human-in-the-loop approval on high-risk actions
addressesOverreliance / Automation Bias
Approval fatigue turns gates into rubber stamps; gates placed after the point of no return do nothing; and approvers can be misled by a model-written summary of the action.
Governance: risk assessment, red-teaming & incident response
addressesOverreliance / Automation Bias
Process reduces likelihood and speeds recovery but executes no technical control itself; weak follow-through makes it theatre.

All guardrails for Hallucination →All guardrails for Overreliance / Automation Bias →

Lessons

▸ Fluent, well-formatted output is not evidence of truth — confident citations can be entirely fabricated.
▸ A human in the loop is only a control if they verify; as a passive conduit they just pass the model's errors through.
▸ For high-stakes outputs, require verification against the source of record before anything is committed.
▸ Asking the model to 'confirm' its own output is not verification — it will confidently confirm fabrications.

Proposals & gaps this case surfaced

Non-destructive suggestions for the library — proposed, not adopted.

✚ proposed guardrailMandatory source-of-record verification before AI-assisted output is committedHuman-in-the-Loop (HITL) Moderation

For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.

coverage gapOverreliance / Automation Bias →

This case shows a gap: 'a human checks it' is often listed as a safeguard, but only counts if there's a real, required verification step. We should treat verification-before-commit as its own control, not assume a person will catch it.

These surface as proposals across the Control Library and Risk Taxonomy; adopt them by hand when ready.

Sources

Mata v. Avianca, Inc. - Wikipedia ↗
Practical Lessons from the Attorney AI Missteps in Mata v. Avianca - Association of Corporate Counsel (ACC) ↗
Mata v. Avianca, Inc. — Wikipedia ↗ — Overview + the sanctions order.

Practise the risk class — related scenarios

🌀The Refund That Never Existed

A support chatbot invents a policy — and the company is held to it

🕵️Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions