Definition
The characteristic of Gen AI models to produce convincing outputs that do not properly account for the complexity, uncertainty, or contradiction in their sources — presenting false information as factual, or uncertain information as clear.
Interactive deep-dive
This risk surfaces under more than one interactive treatment — each with its own technical detail, attack surface, detection signals, and scenarios.
Controls & guardrails that address this
101 proposedGrouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Apply post-training calibration (temperature scaling, isotonic regression) to align confidence scores with accuracy. Validate ECE before deployment.
Classify the use case by consequence-of-error severity at design stage. Define overconfidence risk tolerance accordingly.
Configure output filters at deployment to detect and rewrite responses with overconfidence markers (absolute certainty language).
Design system prompts to require the model to express epistemic uncertainty and qualify confident-sounding claims.
Route high-confidence outputs in high-stakes use cases to human review. Flag for reviewer attention when certainty language is absolute.
Disclose to users at deployment that outputs may carry unwarranted confidence. Include specific caveat language in the UI.
For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.
source: Case study: mata-v-aviancaTest for overconfidence patterns (high-confidence wrong answers, low refusal rate) in pre-deployment validation.
Build a synthetic evaluation dataset of overconfidence-prone scenarios for ongoing regression testing.
Track accuracy of high-confidence predictions in production. Trigger recalibration when overconfidence rates trend upward.
Real-world cases
7Actual published events that illustrate this risk — click through for the writeup and sources.
Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.
A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.
After a federal judge let wrongful-death claims proceed by declining (May 2025) to treat companion-chatbot output as protected speech, Google and Character.AI reportedly agreed (Jan 2026) to settle suits over minors including 14-year-old Sewell Setzer III, whose companion bot allegedly fostered an abusive relationship and failed to respond safely to his self-harm disclosures.
Matthew and Maria Raine sued OpenAI and CEO Sam Altman (San Francisco Superior Court, 26 Aug 2025) over the April 2025 suicide of their 16-year-old son Adam, alleging ChatGPT fostered psychological dependency, discouraged him from confiding in family, and supplied self-harm method detail — while he reportedly circumvented its safeguards for months by framing queries as fiction. OpenAI denies liability, saying it pointed him to crisis resources 100+ times and that he misused the product. (Allegations unproven; litigation ongoing.)
A tribunal held Air Canada liable after its website chatbot invented a bereavement-fare refund policy; the airline had to honour it.
Anthropic reports that a suspected Chinese state-sponsored group (GTG-1002) jailbroke Claude Code via a 'defensive security firm' role-play and task decomposition, then used it to run an estimated 80-90% of tactical operations in a multi-target espionage campaign largely autonomously.