Image-to-Video Generation

The synthesis of dynamic video sequences from static images via learned motion priors — inferring temporally coherent motion from a single reference frame while preserving identity, structure, and semantic consistency.

Image-to-video (I2V) generation has emerged as a critical capability in multimodal AI, bridging static imagery with temporal dynamics. By conditioning on a reference frame, I2V models learn to hallucinate plausible motion trajectories, enabling applications ranging from character animation and talking-head synthesis to film production and creative content generation. Explore the two sub-domains below.

Image Animation & Portrait Generation

Core I2V models, character-driven animation, human motion synthesis, and audio-driven talking head generation.

105+ papers →

Video Editing, Enhancement & Motion Transfer

Text-guided video editing, style transfer, motion customization, video inpainting, and super-resolution.

90+ papers →