Text-to-Image Diffusion

A prompt becomes a picture by repeatedly cleaning up noise

Architecture introduced 29 Nov 2021

You type a description; the system makes a picture. Behind the scenes it doesn't paint pixels straight away. It first turns your words into 'meaning numbers', then starts from a field of pure static and, step by step, cleans the static up until a picture matching your words emerges. A final step blows that compact sketch up into a full image, and a safety check looks at it before you see it. The 'brain' doing the cleanup was downloaded as a big file from a public model hub.

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click any component in the diagram to inspect its risks & defenses

Follow a request · step 1 of 6

← / → keys

You type a description of the picture you want — 'a red fox in a snowy forest, golden hour'.

Next: Conditioned & Edited Image Generation →