← Text-to-Image
Cross-Modal: Video, 3D & Motion
Natural extensions of T2I into the temporal and spatial domains — text-to-video, text-to-3D, motion generation, and shape synthesis.
⌘K
Text-to-Video Generation 15 papers
A natural extension of text-to-image synthesis into the temporal domain — generating coherent video sequences from textual descriptions using diffusion, autoregressive, and hybrid architectures.