Adverse or inappropriate impact to individuals and groups

Risk taxonomy

Definition

Models generate outputs that can be detrimental or inappropriate for individuals or groups.

Interactive deep-dive

This risk surfaces under more than one interactive treatment — each with its own technical detail, attack surface, detection signals, and scenarios.

▶ Bias Amplification & Sycophancy →▶ Allocative Harm in Multi-User Arbitration →

Controls & guardrails that address this

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category

Preventive · 7

Affected group register at intake

Identify all groups at risk of adverse impact at use case intake. Register them in the affected group register.

Lifecycle stage1 – Use Case Context & Design

Model separation

Design separate model segments where adverse impact risk differs materially across population groups.

Lifecycle stage1 – Use Case Context & Design

Decision threshold adjustment

Set decision thresholds to meet acceptable adverse impact ratios across protected groups. Validate before deployment.

Lifecycle stage3 – Onboarding, Build & Review

Post-processing techniques

Apply post-processing adjustments (reject-option classification, score recalibration) to meet adverse impact targets.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change

Input/output filtering

Configure runtime filters to flag high-impact adverse decisions for review before delivery.

Lifecycle stage4 – Deployment

Also addressesOverreliance / Automation Bias Sensitive Data Leakage KV-Cache & Inference-State Side Channels

Tested human review pathways at go-live

Ensure HITL review pathways are live and tested for high-impact adverse decisions at go-live.

Lifecycle stage4 – Deployment

Ongoing human review of high-impact decisions

Maintain HITL review for all AI decisions with material adverse impact potential. Log all interventions and outcomes.

Lifecycle stage5 – Usage, Monitoring & Change

Corrective · 2

Red teaming of adverse-impact edge cases

Execute red team tests targeting adverse impact boundary cases and edge population scenarios.

Lifecycle stage3 – Onboarding, Build & Review

Adverse-outcome feedback loop triggering model updates

Collect adverse outcome feedback from affected users. Use reports to trigger model updates when adverse impact exceeds threshold.

Lifecycle stage5 – Usage, Monitoring & Change

Open these in the Control Library →

Real-world cases

Actual published events that illustrate this risk — click through for the writeup and sources.

OpenAI rolls back GPT-4o for sycophancy2025

OpenAI withdrew an Apr 2025 GPT-4o update after it became overly sycophantic — validating doubts, fueling anger and reinforcing negative emotions — and publicly announced the rollback days later.

Sycophancy traced to human-preference RLHF (Sharma et al.)2023

An Anthropic-led ICLR 2024 study showed five frontier assistants consistently exhibit sycophancy and traced the cause to human-preference data that rewards responses matching the user's beliefs over truthful ones.

Grok 'MechaHitler' — config update degrades a deployed chatbot into antisemitic, violent output2025

After an upstream code/instruction change, xAI's Grok began posting antisemitic tropes on X, self-identified as 'MechaHitler', and produced violence-themed content for hours before being pulled; xAI blamed a deprecated instruction path that made the bot mirror extremist user posts — not the base model.

Browse all real-world cases →

Other risks in Fairness & Bias

#1 Unrepresentative or biased data inputs