← Text-to-Video

Foundation Models & Toolboxes

Core text-to-video architectures and open-source platforms — from early autoregressive methods to modern diffusion transformers.

⌘K

Open-Source Platforms & Toolboxes (2024–2026) 18+ platforms

ModelFull TitleVenueYear
SkyReels-V3SkyReels-V3: Conditional Video Generation Model with Three Core Paradigms Kunlun2026
Seedance 2.0Seedance 2.0: Multi-Scene Narrative Video Generation ByteDance2026
Vidi2.5Vidi2.5: Large Multimodal Models for Video Understanding and Creation DeepSeek2025
Wan-VideoWan: Open and Advanced Large-Scale Video Generative Models Alibaba2025
SkyReels-V2Infinite-length Film Generative Model Kunlun2025
Step-VideoStep-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model StepFun2025
CosmosWorld Foundation Model Platform NVIDIA2025
FastVideoUnified Inference and Post-Training Framework for Accelerated Video Generation arXiv2025
LightX2VLight Video Generation Inference Framework arXiv2025
HunyuanVideoA Systematic Framework For Large Video Generative Models Tencent2024
CogVideoXText-to-Video Diffusion Models with An Expert Transformer ICLR2025
Mochi 1Open Video Generation Model Genmo2024
AllegroAdvanced Video Generation Model Rhymes AI2024
LTX-VideoLightricks Video Generation Lightricks2024
Open-SoraOpen-Source Sora Reproduction by HPC-AI Tech arXiv2024
Open-Sora-PlanOpen-Source Video Generation Plan PKU2024
Pyramidal Flow MatchingEfficient Video Generative Modeling arXiv2024
Stable Video DiffusionScaling Latent Video Diffusion Models to Large Datasets Stability AI2023
VideoCrafter1/2Open Diffusion Models for High-Quality Video Generation Tencent AI Lab2023/2024
ModelScope T2VModelScope Text-to-Video Technical Report Alibaba2023
DiffSynth-StudioDiffSynth Studio: Latent In-Iteration Deflickering arXiv2023
VideoTunaVideoTuna: Video Generation Toolkit arXiv2024

Diffusion Transformer Era (2024–2026) 32+ papers

ModelFull TitleVenueYear
Omni-Video 2Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing arXiv2026
Factorized VidGenFactorized Video Generation: Decoupling Scene Construction and Temporal Synthesis arXiv2025
Wan 2.1Wan 2.1: Advancing Video Generation with Scalable Diffusion Transformers Alibaba2025
HunyuanVideoHunyuanVideo: A Systematic Framework for Large Video Generative Models Tencent2025
Step-Video-T2VStep-Video-T2V: A State-of-the-Art Text-to-Video Generation Model StepFun2025
CogVideoX-5BCogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Zhipu AI2025
Veo 2Veo 2: Photorealistic Video Generation Google DeepMind2025
Causal ForcingAutoregressive Diffusion Distillation for Real-Time Interactive Video Generation arXiv2026
MAGI-1Autoregressive Video Generation at Scale Sand AI2025
Seaweed-7BCost-Effective Training of Video Generation Foundation Model ByteDance2025
Magic 1-For-1Generating One Minute Video Clips within One Minute arXiv2025
Lumina-VideoEfficient and Flexible Video Generation with Multi-scale Next-DiT arXiv2025
RepVideoRethinking Cross-Layer Representation for Video Generation arXiv2025
M4VMulti-Modal Mamba for Text-to-Video Generation arXiv2025
RIFLExA Free Lunch for Length Extrapolation in Video Diffusion Transformers arXiv2025
Movie GenA Cast of Media Foundation Models Meta2024
SoraVideo Generation Models as World Simulators OpenAI2024
ViduHighly Consistent Text-to-Video Generator with Diffusion Models Shengshu2024
Snap VideoScaled Spatiotemporal Transformers for Text-to-Video Synthesis Snap Inc2024
LatteLatent Diffusion Transformer for Video Generation arXiv2024
GenTronDelving Deep into Diffusion Transformers for Image and Video Generation CVPR2024
LumiereA Space-Time Diffusion Model for Video Generation Google2024
MagicVideo-V2Multi-Stage High-Aesthetic Video Generation ByteDance2024
VideoPoetA Large Language Model for Zero-Shot Video Generation Google2023
Photorealistic Video GenerationPhotorealistic Video Generation with Diffusion Models Google2023
EasyAnimateEasyAnimate: An End-to-End Solution for High-Resolution and Long Video Generation Alibaba2024
VideoCrafter2VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models CVPR2024
AnimateLCMAnimateLCM: Accelerating the Animation of Personalized Diffusion Models with Decoupled Consistency Learning arXiv2024
Open-Sora 2.0Open-Sora 2.0: Commercial-Level Video Generation on a Budget HPC-AI Tech2025
StreamDiTStreamDiT: Streaming Video Generation with Diffusion Transformers arXiv2025
Seedance 1.0Seedance 1.0: Scalable Dance and Motion Video Generation ByteDance2025
GameGen-XGameGen-X: Interactive Open-world Game Video Generation ICLR2025

Video Diffusion Era (2022–2023) 35+ papers

ModelFull TitleVenueYear
AnimateDiffAnimate Your Personalized Text-to-Image Diffusion Models without Specific Tuning ICLR2024
Show-1Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation NUS2023
LaVIEHigh-Quality Video Generation with Cascaded Latent Diffusion Models Shanghai AI Lab2023
InstructVideoInstructing Video Diffusion Models with Human Feedback arXiv2023
VideoLCMVideo Latent Consistency Model arXiv2023
VideoFactorySwap Attention in Spatiotemporal Diffusions for Text-to-Video Generation arXiv2023
FreeInitBridging Initialization Gap in Video Diffusion Models arXiv2023
FreeNoiseTuning-Free Longer Video Diffusion via Noise Rescheduling ICLR2024
Align your LatentsHigh-Resolution Video Synthesis with Latent Diffusion Models CVPR2023
Text2Video-ZeroText-to-Image Diffusion Models Are Zero-Shot Video Generators arXiv2023
VideoComposerCompositional Video Synthesis with Motion Controllability NeurIPS2023
Reuse and DiffuseIterative Denoising for Text-to-Video Generation arXiv2023
Free-BloomZero-Shot Text-to-Video Generator with LLM Director and LDM Animator NeurIPS2024
VideoGenA Reference-Guided Latent Diffusion Approach for High Definition T2V Baidu2023
SEINEShort-to-Long Video Diffusion Model for Generative Transition arXiv2023
DynamiCrafterAnimating Open-domain Images with Video Diffusion Priors CUHK2023
Emu VideoFactorizing Text-to-Video Generation by Explicit Image Conditioning Meta2023
Make Pixels DanceHigh-Dynamic Video Generation ByteDance2023
MicroCinemaA Divide-and-Conquer Approach for Text-to-Video Generation arXiv2023
PYoCoPreserve Your Own Correlation: A Noise Prior for Video Generation ICCV2023
Gen-1Structure and Content-Guided Video Synthesis with Diffusion Models Runway ICCV2023
Latent-ShiftLatent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation Meta2023
Dysen-VDMEmpowering Dynamics-aware Text-to-Video Diffusion with LLMs NUS2023

GAN & Autoregressive Era (2017–2022) 20+ papers

ModelFull TitleVenueYear
Make-A-VideoText-to-Video Generation without Text-Video Data Meta ICLR2023
Imagen VideoHigh Definition Video Generation with Diffusion Models Google2022
CogVideoLarge-scale Pretraining for Text-to-Video via Transformers Tsinghua ICLR2023
PhenakiVariable Length Video Generation from Open Domain Textual Description Google ICLR2023
Video Diffusion ModelsFoundational Video Diffusion Framework Google2022
MagicVideoEfficient Video Generation With Latent Diffusion Models ByteDance2022
NUWA-XLDiffusion over Diffusion for eXtremely Long Video Generation Microsoft2023
NUWAVisual Synthesis Pre-training for Neural visUal World creAtion Microsoft ECCV2022
CogView2Faster and Better Text-to-Image Generation via Hierarchical Transformers NeurIPS2022
GODIVAGenerating Open-DomaIn Videos from nAtural Descriptions Microsoft2021
Tune-A-VideoOne-Shot Tuning of Image Diffusion Models for Text-to-Video Generation ICCV2023
MM-DiffusionLearning Multi-Modal Diffusion Models for Joint Audio and Video Generation CVPR2023
Video Generation From TextPioneering text-to-video generation AAAI2018
Sync-DRAWAutomatic Video Generation using Deep Recurrent Attentive Architectures ACM MM2017
IRC-GANIntrospective Recurrent Convolutional GAN for Text-to-Video Generation IJCAI2019

Commercial Products 10+ products

ProductOrganizationKey FeatureYear
SoraOpenAIWorld simulator, up to 1 minute video 2024
Veo 2Google DeepMind High-definition, cinematic quality2024
KlingKuaiShouReal-time, up to 2 minutes 2024
Gen 3 AlphaRunwayCreative video generation and editing 2024
Dream MachineLuma AIFast, high-quality video from text/image 2024
Wunjo CEWROpen-source video generation and editing 2024
Sora 2OpenAINative audio, 20s clips, 1080p with Remix/Blend tools2025
Veo 3Google DeepMind4K 60fps, native audio, 2-min HD video2025
Kling 2.0KuaishouPrecise camera controls, 1080p, multi-shot2025
SeedanceByteDanceScalable dance and motion video generation2025