← Image-to-Video

Video Editing, Enhancement & Motion Transfer

Text-guided video editing, style transfer, motion customization, video inpainting, super-resolution, and audio synthesis for video.

⌘K

Video Editing (2023–2025) 42+ papers

ModelFull TitleVenueYear
AnyEditAnyEdit: Mastering Unified High-Quality Image Editing for Any Idea arXiv2025
ConsistI2V-EditConsistent Video Editing with Instruction-Tuned Diffusion Models arXiv2025
DiffusionPenDiffusionPen: Towards Controllable Style-Specific Handwritten Text Generation arXiv2025
VACEAll-in-One Video Creation and Editing Alibaba2025
VideoPainterAny-length Video Inpainting and Editing with Plug-and-Play Context Control SIGGRAPH2025
VideoGrainModulating Space-Time Attention for Multi-grained Video Editing ICLR2025
Señorita-2MHigh-Quality Instruction-based Dataset for General Video Editing arXiv2025
MTV-InpaintMulti-Task Long Video Inpainting arXiv2025
MiniMax-RemoverTaming Bad Noise Helps Video Object Removal arXiv2025
LoRA-EditControllable First-Frame-Guided Video Editing via Mask-Aware LoRA arXiv2025
VEGGIEInstructional Editing and Reasoning of Video Concepts arXiv2025
StableV2VStablizing Shape Consistency in Video-to-Video Editing arXiv2024
AnyV2VAnyV2V: A Tuning-Free Framework for Any Video-to-Video Editing Tasks TMLR2024
ReVideoRemake a Video with Motion and Content Control arXiv2024
I2VEditFirst-Frame-Guided Video Editing via Image-to-Video Diffusion Models arXiv2024
FlowVidTaming Imperfect Optical Flows for Consistent Video-to-Video Synthesis arXiv2023
TokenFlowTokenFlow: Consistent Diffusion Features for Consistent Video Editing ICLR2024
Rerender A VideoZero-Shot Text-Guided Video-to-Video Translation SIGGRAPH Asia2023
FateZeroFusing Attentions for Zero-shot Text-based Video Editing ICCV2023
CoDeFContent Deformation Fields for Temporally Consistent Video Processing CVPR2024
VideoSwapCustomized Video Subject Swapping with Interactive Semantic Point CVPR2024
FLATTENOptical Flow-guided Attention for Consistent T2V Editing ICLR2024
MotionEditorEditing Video Motion via Content-Aware Diffusion arXiv2023
Ground-A-VideoZero-shot Grounded Video Editing using T2I Diffusion Models ICLR2024
Tune-A-VideoOne-Shot Tuning of Image Diffusion Models for Text-to-Video Generation ICCV2023
DreamixVideo Diffusion Models Are General Video Editors Google2023
Pix2videoVideo Editing Using Image Diffusion arXiv2023
Video-P2PVideo Editing with Cross-attention Control arXiv2023
Edit-A-VideoSingle Video Editing with Object-Aware Consistency arXiv2023
RAVERAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models CVPR2024
MagicEditMagicEdit: High-Fidelity and Temporally Coherent Video Editing arXiv2024

Motion Transfer & Customization (2024–2025) 22+ papers

ModelFull TitleVenueYear
MotionProPrecise Motion Controller for Image-to-Video Generation CVPR2025
Frame In-N-OutUnbounded Controllable Image-to-Video Generation arXiv2025
FlexiActTowards Flexible Action Control in Heterogeneous Scenarios SIGGRAPH2025
Go-with-the-FlowMotion-Controllable Video Diffusion Using Real-Time Warped Noise arXiv2025
Separate Motion from AppearanceCustomizing Motion via T2V Diffusion Models arXiv2025
LMPLeveraging Motion Prior in Zero-Shot Video Generation with DiT arXiv2025
MotionShopZero-Shot Motion Transfer with Mixture of Score Guidance arXiv2024
Video Motion TransferMotion Transfer with Diffusion Transformers arXiv2024
Trajectory AttentionFine-grained Video Motion Control arXiv2024
MotionCloneTraining-Free Motion Cloning for Controllable Video Generation arXiv2024
VMCVideo Motion Customization using Temporal Attention Adaption CVPR2024
DreamVideoComposing Dream Videos with Customized Subject and Motion CVPR2024
Spectral Motion AlignmentVideo Motion Transfer using Diffusion Models arXiv2024
MotionDirectorMotion Customization of Text-to-Video Diffusion Models ECCV2024
LAMPLearn A Motion Pattern for Few-Shot-Based Video Generation CVPR2024
DreamMotionSpace-Time Self-Similarity Score Distillation for Zero-Shot Video Editing ECCV2024
Customize-A-VideoOne-Shot Motion Customization of Text-to-Video Diffusion Models arXiv2024
Motion InversionMotion Inversion for Video Customization arXiv2024
Time-to-MoveTime-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising arXiv2025
ReVideoReVideo: Remake a Video with Motion and Content Control arXiv2024

Video Enhancement & Restoration 10+ papers

ModelFull TitleVenueYear
Enhance-A-VideoBetter Generated Video for Free arXiv2025
SVFRUnified Framework for Generalized Video Face Restoration arXiv2025
VEnhancerGenerative Space-Time Enhancement for Video Generation arXiv2024
Upscale-A-VideoUpscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution ECCV2024
DiffIR2VR-ZeroZero-Shot Video Restoration with Diffusion-based Image Restoration arXiv2024
LDMVFIVideo Frame Interpolation with Latent Diffusion Models arXiv2023
CaDMCodec-aware Diffusion Modeling for Neural-enhanced Video Streaming arXiv2022
FlashVSRFlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution arXiv2025

Audio Synthesis for Video (2024–2025) 15+ papers

ModelFull TitleVenueYear
AV-DiTEfficient Audio-Visual Diffusion Transformer for Joint Audio and Video arXiv2025
UniFormUnified Diffusion Transformer for Audio-Video Generation arXiv2025
Stable-V2ASynthesis of Synchronized Audio Effects with Temporal and Semantic Controls arXiv2024
AV-LinkTemporally-Aligned Diffusion Features for Cross-Modal Audio-Video arXiv2024
FoleyCrafterBring Silent Videos to Life with Lifelike and Synchronized Sounds arXiv2024
Read, Watch and Scream!Sound Generation from Text and Video arXiv2024
Video-to-AudioVideo-to-Audio Generation with Hidden Alignment arXiv2024
MusicInfuserMaking Video Diffusion Listen and Dance arXiv2025
Draw an AudioLeveraging Multi-Instruction for Video-to-Audio Synthesis arXiv2024
Video-FoleyTwo-Stage Video-To-Sound Generation via Temporal Event Condition arXiv2024
Masked Generative V2AMasked Generative Video-to-Audio Transformers with Synchronicity arXiv2024
MuViVideo-to-Music Generation with Semantic Alignment arXiv2024

Virtual Try-On 5+ papers

ModelFull TitleVenueYear
KeyTailorKeyTailor: Keyframe-Driven Details Injection for Video Virtual Try-On arXiv2025
1-2-1Renaissance of Single-Network Paradigm for Virtual Try-On arXiv2025
Dynamic Try-OnTaming Video Virtual Try-on with Dynamic Attention Mechanism arXiv2024
Fashion-VDMVideo Diffusion Model for Virtual Try-On arXiv2024
ViViDVideo Virtual Try-on using Diffusion Models arXiv2024