🔍AI RiskAtlas
← Risk taxonomy

Bias Amplification & Sycophancy

mediumModel behaviour
Also known as: sycophantic reinforcement, interactional bias

Definition

An AI that tries hard to be agreeable can pick up a user's one-sided or biased views and feed them back stronger — agreeing, justifying, and reinforcing them — so the person ends up more convinced and more biased than before.

Where it attaches

The system components this risk arises at.

🧑 User🧠 LLM🎲 Sampler / Decoder💬 Chat / App Interface

Detection signals

  • Model increasingly agreeing with and escalating a user's one-sided view
  • Sycophantic reinforcement of biased or extreme premises
  • Outputs drifting from balanced ground truth toward the user's stance over a session
  • Evals showing answer flips to match an asserted user opinion

Controls & guardrails that address this

13

Grouped by control function, with the AI lifecycle stage(s) to apply each and the other risks it addresses. Filter by control category below.

Control category
Preventive · 7
Affected group register at intake

Identify all groups at risk of adverse impact at use case intake. Register them in the affected group register.

Lifecycle stage1 – Use Case Context & Design
Model separation

Design separate model segments where adverse impact risk differs materially across population groups.

Lifecycle stage1 – Use Case Context & Design
Decision threshold adjustment

Set decision thresholds to meet acceptable adverse impact ratios across protected groups. Validate before deployment.

Lifecycle stage3 – Onboarding, Build & Review
Post-processing techniques

Apply post-processing adjustments (reject-option classification, score recalibration) to meet adverse impact targets.

Lifecycle stages3 – Onboarding, Build & Review5 – Usage, Monitoring & Change
Input/output filtering

Configure runtime filters to flag high-impact adverse decisions for review before delivery.

Tested human review pathways at go-live

Ensure HITL review pathways are live and tested for high-impact adverse decisions at go-live.

Lifecycle stage4 – Deployment
Ongoing human review of high-impact decisions

Maintain HITL review for all AI decisions with material adverse impact potential. Log all interventions and outcomes.

Lifecycle stage5 – Usage, Monitoring & Change
Detective · 3
Grounding / citation checksinteractive

Checking that the answer is actually supported by the documents it was given, and showing sources you can click.

Also addressesHallucination
Corrective · 3
Red teaming of adverse-impact edge cases

Execute red team tests targeting adverse impact boundary cases and edge population scenarios.

Lifecycle stage3 – Onboarding, Build & Review
Adverse-outcome feedback loop triggering model updates

Collect adverse outcome feedback from affected users. Use reports to trigger model updates when adverse impact exceeds threshold.

Lifecycle stage5 – Usage, Monitoring & Change
Open these in the Control Library →

Framework mappings

OWASP LLM Top 10
  • LLM09:2025 Misinformation
MITRE ATLAS
NIST AI RMF
  • MEASURE 2.11
  • MEASURE 2.3

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning — not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading →·Built by Shi Yuan ↗