๐Ÿ”AI RiskAtlas
โ† All systems

Diffusion Video Generation

Text-to-video: a denoiser with a time axis bolted on

Architecture introduced 07 Apr 2022

Video generators work like image generators with a sense of time added: the same 'clean up the noise' trick makes a frame, and a motion module keeps the frames consistent so they move smoothly. Feed it a reference face or a pose and it can make a specific person appear to move and act.

UntrustedReference & supply chainVideo generation pipelineOutput integritygoal + refreference๐Ÿง‘User๐ŸŒReference image/ pose๐ŸชModel / PackageRegistry๐ŸงฌVideo diffusioncheckpoint๐ŸŽ›๏ธOrchestrator /Agent Loop๏ฟฝ๐ŸงฉPrompt Assembly๐Ÿ”คText / CLIPEncoder๐ŸŽ›๏ธConditioningAdapter๐Ÿง Spatialdenoiser (U-Net๐ŸŽž๏ธTemporal /Motion Module๐ŸŽฒSampler /Decoder๐Ÿ—œ๏ธVAE / LatentCodec๐ŸงฏOutputGuardrail๐Ÿ”–ContentProvenance &
InstructionsDataActionsControl / decisionFeedback / logs
๐Ÿ‘† Click any component in the diagram to inspect its risks & defenses

Follow a request ยท step 1 of 4

You describe a clip โ€” and optionally hand it a reference photo of a face or a pose to follow.

AI RiskAtlas is an educational model of how GenAI & agentic systems work and fail. Architectures and payloads are illustrative and simplified for learning โ€” not operational guidance. Real-world cases are summarised from public reporting.

Sources & further reading โ†’ยทBuilt by Shi Yuan โ†—