Diffusion Video Generation

Text-to-video: a denoiser with a time axis bolted on

Architecture introduced 07 Apr 2022

Video generators work like image generators with a sense of time added: the same 'clean up the noise' trick makes a frame, and a motion module keeps the frames consistent so they move smoothly. Feed it a reference face or a pose and it can make a specific person appear to move and act.

InstructionsDataActionsControl / decisionFeedback / logs

👆 Click any component in the diagram to inspect its risks & defenses

Follow a request · step 1 of 4

← / → keys

You describe a clip — and optionally hand it a reference photo of a face or a pose to follow.