πŸ”AI RiskAtlas
← Scenario library

Lies in the Loop

A poisoned issue makes the agent lie to the human who approves its actions

Technique first revealed 23 Feb 2023

Tool-Using Agent
UntrustedAgent coreOversightThe real worldgoalcontexthigh-risk?πŸ§‘UserπŸŽ›οΈOrchestrator /Agent Loop🧠LLMπŸ”Identity &PermissionsπŸ”§Tool Runtimeβœ‹Human ApprovalGateπŸ”ŒExternal APIsπŸ—„οΈBusinessDatabase🌐UntrustedContentπŸ“Audit Logging
InstructionsDataActionsControl / decisionFeedback / logs
πŸ‘† Click a component to inspect
SetupStep 1 / 7

An agent that asks before it acts

A team runs a coding assistant that reads incoming bug reports and helps fix them. It can run shell commands, but it's set up so that anything risky β€” deleting things, force-pushing, sending data out β€” pauses and asks a human to approve first. The team feels safe: a person is always in the loop.

βš™οΈApproval policy (excerpt)config
policy:
  auto_approve: [read_file, list_dir, grep, run_tests]
  require_human_approval: [shell_exec, git_push, http_post, delete]
  approval_view: agent_summary   # <-- the human sees the agent's summary
  irreversible_actions: confirm_once

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning β€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading β†’Β·Built by Shi Yuan β†—