← Image-to-Video

Image Animation & Portrait Generation

Core image-to-video models, character-driven animation, human motion synthesis, and audio-driven talking head generation.

⌘K

Core Image-to-Video Models (2023–2025) 38+ papers

ModelFull TitleVenueYear
Pixel-to-4DPixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians arXiv2026
Veo 2Veo 2: State-of-the-Art Video Generation with Google DeepMind Google DeepMind2025
Kling 1.6Kling 1.6: Advanced AI Video Generation Model Kuaishou2025
Pika 2.0Pika 2.0: Next-Generation AI Video Generator Pika Labs2025
Runway Gen-3 AlphaGen-3 Alpha: A New Frontier for Video Generation Models Runway2024
Luma Dream MachineDream Machine: AI Model That Makes High Quality Videos from Text and Images Luma AI2024
JimengJimeng: Image-to-Video Generation with Diffusion Transformers ByteDance2025
Stable Video DiffusionScaling Latent Video Diffusion Models to Large Datasets Stability AI2023
DynamiCrafterAnimating Open-domain Images with Video Diffusion Priors CUHK2023
I2VGen-XLHigh-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Alibaba2023
PIAPersonalized Image Animator via Plug-and-Play Modules in T2I Models arXiv2023
AnimateDiffAnimate Your Personalized Text-to-Image Diffusion Models without Specific Tuning ICLR2024
ConsistI2VEnhancing Visual Consistency for Image-to-Video Generation arXiv2024
TI2V-ZeroZero-Shot Image Conditioning for Text-to-Video Diffusion Models CVPR2024
MagicTimeTime-lapse Video Generation Models as Metamorphic Simulators arXiv2024
TRIPTemporal Residual Learning with Image Noise Prior for I2V Diffusion Models CVPR2024
StoryDiffusionConsistent Self-Attention for Long-Range Image and Video Generation arXiv2024
Video-LaVITUnified Video-Language Pre-training with Decoupled Visual-Motional Tokenization arXiv2024
CinemoConsistent and Controllable Image Animation with Motion Diffusion Models arXiv2024
I2V-AdapterA General Image-to-Video Adapter for Video Diffusion Models arXiv2023
MotiFMaking Text Count in Image Animation with Motion Focal Loss arXiv2024
DLFR-VAEDynamic Latent Frame Rate VAE for Video Generation arXiv2025
Packing Input Frame ContextNext-Frame Prediction Models for Video Generation arXiv2025
Step-Video-TI2VState-of-the-Art Text-Driven Image-to-Video Generation Model arXiv2025
SparseCtrlSparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models arXiv2024
LivePhotoLivePhoto: Real Image Animation with Text-Guided Motion Control arXiv2024
ToonCrafterToonCrafter: Generative Cartoon Interpolation arXiv2024
Follow-Your-ClickFollow-Your-Click: Open-domain Regional Image Animation via Short Prompts arXiv2024
FrameBridgeFrameBridge: Improving Image-to-Video Generation with Bridge Models ICLR2025
DFoTHistory-Guided Video Diffusion: Diffusion Forcing Transformer for Variable-Length Conditioning arXiv2025
CogVideoX-I2VCogVideoX: Text-to-Video Diffusion Models with An Expert Transformer for I2V ICLR2025
Wan-I2VWan: Open and Advanced Large-Scale Image-to-Video Generative Models Alibaba2025
HunyuanVideo-I2VHunyuanVideo: Image-to-Video Generation with Systematic Framework Tencent2025
EasyAnimate-I2VEasyAnimate: An End-to-End Solution for Image-to-Video Generation Alibaba2024
ALIVEALIVE: Animate Your World with Lifelike Audio-Video Generation arXiv2026

Character Animation & Human Motion (2023–2025) 25+ papers

ModelFull TitleVenueYear
OmniHuman-1Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models ICCV2025
Animate Anyone 2High-Fidelity Character Image Animation with Environment Affordance ICCV2025
MTVCrafter4D Motion Tokenization for Open-World Human Image Animation arXiv2025
HumanDiTPose-Guided Diffusion Transformer for Long-form Human Motion Video arXiv2025
X-DancerExpressive Music to Human Dance Video Generation arXiv2025
AnyCharVBootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance arXiv2025
HunyuanCustomMultimodal-Driven Architecture for Customized Video Generation arXiv2025
VideoJAMJoint Appearance-Motion Representations for Enhanced Motion Generation arXiv2025
Animate AnyoneConsistent and Controllable Image-to-Video Synthesis for Character Animation arXiv2023
MagicAnimateTemporally Consistent Human Image Animation using Diffusion Model NTU2023
DreaMovingA Human Video Generation Framework based on Diffusion Models arXiv2023
ChampControllable and Consistent Human Image Animation with 3D Parametric Guidance arXiv2024
UniAnimateTaming Unified Video Diffusion Models for Consistent Human Image Animation arXiv2024
MimicMotionHigh-Quality Human Motion Video with Confidence-aware Pose Guidance arXiv2024
LivePortraitEfficient Portrait Animation with Stitching and Retargeting Control arXiv2024
ID-AnimatorZero-Shot Identity-Preserving Human Video Generation arXiv2024
DreamVideo-2Zero-Shot Subject-Driven Video Customization with Precise Motion Control arXiv2024
CustomCrafterCustomized Video Generation with Preserving Motion and Concept Composition arXiv2024
Magic-MeIdentity-Specific Video Customized Diffusion arXiv2024
Concat-IDTowards Universal Identity-Preserving Video Synthesis arXiv2025
PhantomSubject-consistent Video Generation via Cross-modal Alignment arXiv2025
ConceptMasterMulti-Concept Video Customization on DiT without Test-Time Tuning arXiv2025

Talking Head & Portrait Animation (2024–2025) 26+ papers

ModelFull TitleVenueYear
Avatar ForcingAvatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation arXiv2026
SuperHeadFrom Blurry to Believable: Enhancing Low-quality Talking Heads with 3D Generative Priors arXiv2026
TalkingMachinesReal-Time Audio-Driven FaceTime-Style Video via AR Diffusion arXiv2025
OmniTalkerReal-Time Text-Driven Talking Head with In-Context Audio-Visual Style arXiv2025
MoChaTowards Movie-Grade Talking Character Synthesis arXiv2025
SayAnythingAudio-Driven Lip Synchronization with Conditional Video Diffusion arXiv2025
KeySyncRobust Approach for Leakage-free Lip Synchronization arXiv2025
IM-PortraitLearning 3D-aware Video Diffusion for Photorealistic Talking Heads arXiv2025
MEMOMemory-Guided Diffusion for Expressive Talking Video Generation ICLR2025
Hallo3Highly Dynamic Portrait Image Animation with Video Diffusion Transformer CVPR2025
Hallo2Long-Duration and High-Resolution Audio-driven Portrait Animation arXiv2024
HalloHierarchical Audio-Driven Visual Synthesis for Portrait Image Animation arXiv2024
EchoMimicLifelike Audio-Driven Portrait Animations through Editable Landmark arXiv2024
FLOATGenerative Motion Latent Flow Matching for Audio-driven Talking Portrait arXiv2024
SINGERVivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion arXiv2024
LoopyTaming Audio-Driven Portrait Avatar with Long-Term Motion Dependency arXiv2024
HelloMemeIntegrating Spatial Knitting Attentions for High-Fidelity Conditions arXiv2024
X-PortraitExpressive Portrait Animation with Hierarchical Motion Attention arXiv2024
DAWNDynamic Frame Avatar with Non-autoregressive Diffusion for Talking Head arXiv2024
MimicTalkMimicking a Personalized and Expressive 3D Talking Face in Few Minutes arXiv2024
THEvalEvaluation Framework for Talking Head Generation arXiv2025
EmotiveTalkEmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion CVPR2025
DimitraDimitra: Conditional Motion Diffusion Transformer for Audio-Driven Talking Head Generation arXiv2025
RAPRAP: Real-Time Audio-Driven Portrait Animation using Video Diffusion Transformers arXiv2025
SadTalkerSadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation CVPR2023