Computer-Use Agent

An agent that drives a real browser or desktop by sight

Architecture introduced 22 Oct 2024

Instead of calling neat software functions, this assistant uses a computer the way you do: it looks at the screen, then moves the mouse, clicks, and types. That makes it powerful — it can use any app — but it now 'reads' whatever is on screen, including web pages written by strangers.

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click any component in the diagram to inspect its risks & defenses

Follow a request · step 1 of 6

← / → keys

You give the assistant a goal it has to do on a real computer — 'book the cheapest flight on this travel site'.

Scenarios on this architecture

👁️

The Invisible Webpage Command

A shopping page tells the agent to do something the user never asked for

Next: Multi-Agent System →