Overreliance / Automation Bias
mediumOversightDefinition
People trust the AI too much — accepting its answers without checking, even on important decisions — because it sounds confident and is usually right.
Where it attaches
The system components this risk arises at.
Detection signals
- ▸ High-stakes actions taken with no human verification step
- ▸ Approval gates rubber-stamped (near-100% approve rate, low dwell time)
- ▸ Users unable to explain why they trusted an output
Controls & guardrails that address this
272 proposedGrouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.
Mandate AI risk awareness training for all use case sponsors and design team members before project kick-off.
Mandate AI risk training for all build and test personnel. Gate project participation on training completion.
Mandate human verification for high-stakes decisions where over-reliance risk is elevated. Review automation bias incidents quarterly.
Surface AI limitation warnings and over-reliance caveats in every production interaction. Update disclosures when model changes.
Require AI governance training for all personnel involved in data acquisition and processing before project participation.
Verify all deployment, operations, and customer-facing team members have completed AI risk training before launch.
Define AI identity disclosure policy at design stage. Specify when and how the system must identify itself as AI.
Plan consent and AI identity disclosure touchpoints in the user journey at design stage.
Design system prompts to explicitly prevent the model from claiming human-like identity or implying sentience.
Implement persistent AI identity disclosures in the UI (opening banner, inline notifications). Test before deployment.
Verify all AI identity disclosure elements are live, accurate, and prominently visible before go-live.
Monitor production for anthropomorphism incidents. Escalate complaints where users believed they were interacting with a human.
Apply post-training calibration (temperature scaling, isotonic regression) to align confidence scores with accuracy. Validate ECE before deployment.
Classify the use case by consequence-of-error severity at design stage. Define overconfidence risk tolerance accordingly.
Configure output filters at deployment to detect and rewrite responses with overconfidence markers (absolute certainty language).
Design system prompts to require the model to express epistemic uncertainty and qualify confident-sounding claims.
Route high-confidence outputs in high-stakes use cases to human review. Flag for reviewer attention when certainty language is absolute.
Disclose to users at deployment that outputs may carry unwarranted confidence. Include specific caveat language in the UI.
For high-stakes outputs, require a human to verify each AI-asserted fact/citation against the authoritative source of record before it is filed, sent, or committed — a hard gate, logged and attributable, not an optional review.
source: Case study: mata-v-aviancaProvide recurring AI-literacy training to end users and decision-makers so they can recognise model failure modes and competently apply verification workflows, with periodic refreshers to counter automation bias and training decay.
source: Interactive-control reconciliation: ctrl-literacy (partial coverage)Teaching the AI to say 'I'm not sure' or 'I can't verify that' instead of confidently guessing.
Pausing to ask a person before doing anything big or hard to undo — sending money, deleting data, emailing customers.
Test for overconfidence patterns (high-confidence wrong answers, low refusal rate) in pre-deployment validation.
Build a synthetic evaluation dataset of overconfidence-prone scenarios for ongoing regression testing.
Track accuracy of high-confidence predictions in production. Trigger recalibration when overconfidence rates trend upward.
Helping the people using AI understand its limits, so they check important answers instead of blindly trusting them.
The organisational habits around the AI: assessing risks before launch, actively trying to break it, and having a plan for when something goes wrong.
Framework mappings
- LLM09:2025 Misinformation
- GOVERN 4.1
- MEASURE 2.8
Real-world cases
5Actual published events that illustrate this risk — click through for the writeup and sources.
Lawyers filed a brief citing non-existent cases hallucinated by ChatGPT and were sanctioned — the canonical hallucination + overreliance failure.
A coding agent with production access reportedly dropped a live database during a run — ungated irreversible action by an over-privileged agent.
A USENIX Security 2025 study found code-generating LLMs routinely recommend non-existent packages (~5.2% commercial to 21.7% open-source of suggestions), letting attackers pre-register the predictable fake names — a tactic dubbed 'slopsquatting'.
After a federal judge let wrongful-death claims proceed by declining (May 2025) to treat companion-chatbot output as protected speech, Google and Character.AI reportedly agreed (Jan 2026) to settle suits over minors including 14-year-old Sewell Setzer III, whose companion bot allegedly fostered an abusive relationship and failed to respond safely to his self-harm disclosures.
Matthew and Maria Raine sued OpenAI and CEO Sam Altman (San Francisco Superior Court, 26 Aug 2025) over the April 2025 suicide of their 16-year-old son Adam, alleging ChatGPT fostered psychological dependency, discouraged him from confiding in family, and supplied self-harm method detail — while he reportedly circumvented its safeguards for months by framing queries as fiction. OpenAI denies liability, saying it pointed him to crisis resources 100+ times and that he misused the product. (Allegations unproven; litigation ongoing.)