← Text-to-Video

Controllable, Efficient & Long Video Generation

Camera control, motion trajectories, inference acceleration, and techniques for generating longer, higher-quality videos.

⌘K

Controllable Video Generation (2024–2025) 27+ papers

ModelFull TitleVenueYear
BulletTimeBulletTime: Decoupled Control of Time and Camera Pose for Video Generation arXiv2025
InfCamInfCam: Depth-Free Camera Control via Infinite Homography Warping arXiv2025
VACEAll-in-One Video Creation and Editing Alibaba2025
FlexiActTowards Flexible Action Control in Heterogeneous Scenarios SIGGRAPH2025
VideoPainterAny-length Video Inpainting and Editing with Plug-and-Play Context Control SIGGRAPH2025
GEN3C3D-Informed World-Consistent Video Generation with Precise Camera Control CVPR2025
ReCamMasterCamera-Controlled Generative Rendering from A Single Video arXiv2025
CineMasterA 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation arXiv2025
MotionCanvasCinematic Shot Design with Controllable Image-to-Video Generation arXiv2025
MagicMotionControllable Video with Dense-to-Sparse Trajectory Guidance arXiv2025
CameraCtrl IIDynamic Scene Exploration via Camera-controlled Video Diffusion Models arXiv2025
C-DragChain-of-Thought Driven Motion Controller for Video Generation arXiv2025
Any2CaptionInterpreting Any Condition to Caption for Controllable Video Generation arXiv2025
SketchVideoSketch-based Video Generation and Editing arXiv2025
OmniVDiffOmni Controllable Video Diffusion for Generation and Understanding arXiv2025
ToraTrajectory-oriented Diffusion Transformer for Video Generation CVPR2025
MotionCtrlA Unified and Flexible Motion Controller for Video Generation SIGGRAPH2024
CameraCtrlEnabling Camera Control for Video Diffusion Models arXiv2024
DragAnythingMotion Control for Anything using Entity Representation ECCV2024
DragNUWAFine-grained Control via Text, Image, and Trajectory arXiv2023
SparseCtrlAdding Sparse Controls to Text-to-Video Diffusion Models arXiv2023
TrailBlazerTrajectory Control for Diffusion-Based Video Generation arXiv2024
Animate AnyoneConsistent and Controllable Image-to-Video Synthesis for Character Animation arXiv2023
Control-A-VideoControllable Text-to-Video Generation with Diffusion Models arXiv2023
ControlVideoTraining-free Controllable Text-to-Video Generation arXiv2023

Efficient Video Generation (2024–2025) 16+ papers

ModelFull TitleVenueYear
TeleBoostTeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation arXiv2026
SpargeAttnAccurate Sparse Attention Accelerating Any Model Inference arXiv2025
SageAttention2Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization arXiv2025
FlashVideoFlowing Fidelity to Detail for Efficient High-Resolution Video Generation arXiv2025
Sparse VideoGenAccelerating Video Diffusion Transformers with Spatial-Temporal Sparsity arXiv2025
Fast Sliding Tile AttentionFast Video Generation with Sliding Tile Attention arXiv2025
Diffusion Adversarial Post-TrainingOne-Step Video Generation arXiv2025
Turbo2KTowards Ultra-Efficient and High-Quality 2K Video Synthesis arXiv2025
T2V-Turbo-v2Enhancing Video Generation Model Post-Training arXiv2024
Real-Time PABReal-Time Video Generation with Pyramid Attention Broadcast arXiv2024
xGen-VideoSyn-1High-fidelity Text-to-Video Synthesis with Compressed Representations arXiv2024
SageAttentionAccurate 8-Bit Attention for Plug-and-play Inference Acceleration arXiv2024
From Slow to FastFrom Slow Bidirectional to Fast Causal Video Generators arXiv2024
MotionStreamMotionStream: Real-Time Video Generation with Interactive Motion Controls arXiv2025
Delta-DiTDelta-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers arXiv2025
TeaCacheTeaCache: Training-Free Input-Aware Cache for Accelerating Diffusion Models arXiv2025

Long Video & Film Generation (2024–2025) 24+ papers

ModelFull TitleVenueYear
HoloCineHoloCine: Holistic Generation of Cinematic Multi-Shot Long Video Narratives arXiv2025
CineSceneCineScene: Implicit 3D as Effective Scene Representation for Cinematic Video Generation arXiv2026
SkyReels-V2Infinite-length Film Generative Model arXiv2025
Mask²DiTDual Mask-based Diffusion Transformer for Multi-Scene Long Video CVPR2025
One-Minute VideoTest-Time Training for Long Video Generation arXiv2025
MovieAgentAutomated Movie Generation via Multi-Agent CoT Planning arXiv2025
Long Context TuningLong Context Tuning for Video Generation arXiv2025
RIFLExA Free Lunch for Length Extrapolation in Video Diffusion Transformers arXiv2025
VideoAuteurTowards Long Narrative Video Generation arXiv2025
Ouroboros-DiffusionExploring Consistent Content Generation in Tuning-free Long Video Diffusion arXiv2025
DiTCtrlExploring Attention Control in Multi-Modal DiT for Multi-Prompt Longer Video arXiv2024
LinGenTowards High-Resolution Minute-Length T2V with Linear Computational Complexity arXiv2024
LoongGenerating Minute-level Long Videos with Autoregressive Language Models arXiv2024
ARLONBoosting Diffusion Transformers with AR Models for Long Video arXiv2024
MovieDreamerHierarchical Generation for Coherent Long Visual Sequence arXiv2024
FIFO-DiffusionGenerating Infinite Videos from Text without Training arXiv2024
StoryDiffusionConsistent Self-Attention for Long-Range Image and Video Generation arXiv2024
StreamingT2VConsistent, Dynamic, and Extendable Long Video Generation from Text arXiv2024
Gen-L-VideoMulti-Text to Long Video Generation via Temporal Co-Denoising arXiv2023
NUWA-XLDiffusion over Diffusion for eXtremely Long Video Generation Microsoft2023
GameFactoryGameFactory: Creating New Games with Generative Interactive Videos arXiv2025
MemoryPackMemoryPack: Long-Form Autoregressive Video Generation via Learnable Context Retrieval arXiv2025

Video Generation with 3D/Physical Prior (2024–2025) 15+ papers

ModelFull TitleVenueYear
DiffusionRendererNeural Inverse and Forward Rendering with Video Diffusion Models arXiv2025
Diffusion as Shader3D-aware Video Diffusion for Versatile Video Generation Control arXiv2025
ReVisionHigh-Quality Low-Cost Video Generation with Explicit 3D Physics Modeling arXiv2025
MoReGenPhysics-Grounded Video Synthesis with Multi-agent LLMs arXiv2025
Force PromptingVideo Generation Models Can Learn Physics-based Control arXiv2025
PhysGenRigid-Body Physics-Grounded Image-to-Video Generation arXiv2024
PhysDreamerPhysics-Based Interaction with 3D Objects via Video Generation ECCV2024
AutoVFXPhysically Realistic Video Editing from Natural Language Instructions arXiv2024
PhysMotionPhysics-Grounded Dynamics From a Single Image arXiv2024
PhyT2VLLM-Guided Iterative Self-Refinement for Physics-Grounded T2V arXiv2024
ViewCrafterTaming Video Diffusion Models for High-fidelity Novel View Synthesis arXiv2024
StereoCrafterDiffusion-based Generation of Long Stereoscopic 3D from Monocular Videos arXiv2024
Vid2WorldVid2World: Crafting Video Diffusion Models to Interactive World Models ICLR2026

Alignment & Feedback (2024–2026) 1+ papers

ModelFull TitleVenueYear
FairT2VFairT2V: Training-Free Debiasing Framework for Text-to-Video Diffusion Models arXiv2026