β Scenario library
Lies in the Loop
A poisoned issue makes the agent lie to the human who approves its actions
Technique first revealed 23 Feb 2023
Tool-Using Agent
InstructionsDataActionsControl / decisionFeedback / logs
π Click a component to inspectSetupStep 1 / 7
An agent that asks before it acts
A team runs a coding assistant that reads incoming bug reports and helps fix them. It can run shell commands, but it's set up so that anything risky β deleting things, force-pushing, sending data out β pauses and asks a human to approve first. The team feels safe: a person is always in the loop.
βοΈApproval policy (excerpt)config
policy: auto_approve: [read_file, list_dir, grep, run_tests] require_human_approval: [shell_exec, git_push, http_post, delete] approval_view: agent_summary # <-- the human sees the agent's summary irreversible_actions: confirm_once
β / β keys