Foundation Models & Toolboxes

Core text-to-video architectures and open-source platforms — from early autoregressive methods to modern diffusion transformers.

⌘K

Open-Source Platforms & Toolboxes (2024–2026) 18+ platforms

Model	Full Title	Venue	Year
SkyReels-V3	SkyReels-V3: Conditional Video Generation Model with Three Core Paradigms	Kunlun	2026
LTX-2	LTX-2: Efficient Joint Audio-Visual Foundation Model	Lightricks	2026
Seedance 2.0	Seedance 2.0: Multi-Scene Narrative Video Generation	ByteDance	2026
Vidi2.5	Vidi2.5: Large Multimodal Models for Video Understanding and Creation	DeepSeek	2025
Wan-Video	Wan: Open and Advanced Large-Scale Video Generative Models	Alibaba	2025
SkyReels-V2	Infinite-length Film Generative Model	Kunlun	2025
Step-Video	Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model	StepFun	2025
Cosmos	World Foundation Model Platform	NVIDIA	2025
FastVideo	Unified Inference and Post-Training Framework for Accelerated Video Generation	arXiv	2025
LightX2V	Light Video Generation Inference Framework	arXiv	2025
HunyuanVideo	A Systematic Framework For Large Video Generative Models	Tencent	2024
CogVideoX	Text-to-Video Diffusion Models with An Expert Transformer	ICLR	2025
Mochi 1	Open Video Generation Model	Genmo	2024
Allegro	Advanced Video Generation Model	Rhymes AI	2024
LTX-Video	Lightricks Video Generation	Lightricks	2024
Open-Sora	Open-Source Sora Reproduction by HPC-AI Tech	arXiv	2024
Open-Sora-Plan	Open-Source Video Generation Plan	PKU	2024
Pyramidal Flow Matching	Efficient Video Generative Modeling	arXiv	2024
Stable Video Diffusion	Scaling Latent Video Diffusion Models to Large Datasets	Stability AI	2023
VideoCrafter1/2	Open Diffusion Models for High-Quality Video Generation	Tencent AI Lab	2023/2024
ModelScope T2V	ModelScope Text-to-Video Technical Report	Alibaba	2023
DiffSynth-Studio	DiffSynth Studio: Latent In-Iteration Deflickering	arXiv	2023
VideoTuna	VideoTuna: Video Generation Toolkit	arXiv	2024

Diffusion Transformer Era (2024–2026) 32+ papers

Model	Full Title	Venue	Year
Omni-Video 2	Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing	arXiv	2026
MOVA	MOVA: Towards Scalable and Synchronized Video-Audio Generation	arXiv	2026
Context Forcing	Context Forcing: Consistent Autoregressive Video Generation with Long Context	arXiv	2026
FSVideo	FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space	arXiv	2026
VTok	VTok: A Unified Video Tokenizer with Decoupled Spatial-Temporal Latents	arXiv	2026
Self-Refining Video	Self-Refining Video Sampling	arXiv	2026
LoL	LoL: Longer than Longer, Scaling Video Generation to Hour	arXiv	2026
VideoAR	VideoAR: Autoregressive Video Generation via Next-Frame and Scale Prediction	arXiv	2026
SemanticGen	SemanticGen: Video Generation in Semantic Space	arXiv	2026
Apollo	Apollo: Unified Multi-Task Audio-Video Joint Generation	arXiv	2026
StoryMem	StoryMem: Multi-shot Long Video Storytelling with Memory	arXiv	2026
Kling-Omni	Kling-Omni Technical Report	arXiv	2026
BlobGEN-Vid	BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations	arXiv	2025
Factorized VidGen	Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis	arXiv	2025
Wan 2.1	Wan 2.1: Advancing Video Generation with Scalable Diffusion Transformers	Alibaba	2025
HunyuanVideo	HunyuanVideo: A Systematic Framework for Large Video Generative Models	Tencent	2025
Step-Video-T2V	Step-Video-T2V: A State-of-the-Art Text-to-Video Generation Model	StepFun	2025
CogVideoX-5B	CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer	Zhipu AI	2025
Veo 2	Veo 2: Photorealistic Video Generation	Google DeepMind	2025
Causal Forcing	Autoregressive Diffusion Distillation for Real-Time Interactive Video Generation	arXiv	2026
MAGI-1	Autoregressive Video Generation at Scale	Sand AI	2025
Seaweed-7B	Cost-Effective Training of Video Generation Foundation Model	ByteDance	2025
Magic 1-For-1	Generating One Minute Video Clips within One Minute	arXiv	2025
Lumina-Video	Efficient and Flexible Video Generation with Multi-scale Next-DiT	arXiv	2025
RepVideo	Rethinking Cross-Layer Representation for Video Generation	arXiv	2025
M4V	Multi-Modal Mamba for Text-to-Video Generation	arXiv	2025
RIFLEx	A Free Lunch for Length Extrapolation in Video Diffusion Transformers	arXiv	2025
Movie Gen	A Cast of Media Foundation Models	Meta	2024
Sora	Video Generation Models as World Simulators	OpenAI	2024
Vidu	Highly Consistent Text-to-Video Generator with Diffusion Models	Shengshu	2024
Snap Video	Scaled Spatiotemporal Transformers for Text-to-Video Synthesis	Snap Inc	2024
Latte	Latent Diffusion Transformer for Video Generation	arXiv	2024
GenTron	Delving Deep into Diffusion Transformers for Image and Video Generation	CVPR	2024
Lumiere	A Space-Time Diffusion Model for Video Generation	Google	2024
MagicVideo-V2	Multi-Stage High-Aesthetic Video Generation	ByteDance	2024
VideoPoet	A Large Language Model for Zero-Shot Video Generation	Google	2023
Photorealistic Video Generation	Photorealistic Video Generation with Diffusion Models	Google	2023
EasyAnimate	EasyAnimate: An End-to-End Solution for High-Resolution and Long Video Generation	Alibaba	2024
VideoCrafter2	VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models	CVPR	2024
AnimateLCM	AnimateLCM: Accelerating the Animation of Personalized Diffusion Models with Decoupled Consistency Learning	arXiv	2024
Open-Sora 2.0	Open-Sora 2.0: Commercial-Level Video Generation on a Budget	HPC-AI Tech	2025
StreamDiT	StreamDiT: Streaming Video Generation with Diffusion Transformers	arXiv	2025
Seedance 1.0	Seedance 1.0: Scalable Dance and Motion Video Generation	ByteDance	2025
GameGen-X	GameGen-X: Interactive Open-world Game Video Generation	ICLR	2025

Video Diffusion Era (2022–2023) 35+ papers

Model	Full Title	Venue	Year
AnimateDiff	Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning	ICLR	2024
Show-1	Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation	NUS	2023
LaVIE	High-Quality Video Generation with Cascaded Latent Diffusion Models	Shanghai AI Lab	2023
InstructVideo	Instructing Video Diffusion Models with Human Feedback	arXiv	2023
VideoLCM	Video Latent Consistency Model	arXiv	2023
VideoFactory	Swap Attention in Spatiotemporal Diffusions for Text-to-Video Generation	arXiv	2023
FreeInit	Bridging Initialization Gap in Video Diffusion Models	arXiv	2023
FreeNoise	Tuning-Free Longer Video Diffusion via Noise Rescheduling	ICLR	2024
Align your Latents	High-Resolution Video Synthesis with Latent Diffusion Models	CVPR	2023
Text2Video-Zero	Text-to-Image Diffusion Models Are Zero-Shot Video Generators	arXiv	2023
VideoComposer	Compositional Video Synthesis with Motion Controllability	NeurIPS	2023
Reuse and Diffuse	Iterative Denoising for Text-to-Video Generation	arXiv	2023
Free-Bloom	Zero-Shot Text-to-Video Generator with LLM Director and LDM Animator	NeurIPS	2024
VideoGen	A Reference-Guided Latent Diffusion Approach for High Definition T2V	Baidu	2023
SEINE	Short-to-Long Video Diffusion Model for Generative Transition	arXiv	2023
DynamiCrafter	Animating Open-domain Images with Video Diffusion Priors	CUHK	2023
Emu Video	Factorizing Text-to-Video Generation by Explicit Image Conditioning	Meta	2023
Make Pixels Dance	High-Dynamic Video Generation	ByteDance	2023
MicroCinema	A Divide-and-Conquer Approach for Text-to-Video Generation	arXiv	2023
PYoCo	Preserve Your Own Correlation: A Noise Prior for Video Generation	ICCV	2023
Gen-1	Structure and Content-Guided Video Synthesis with Diffusion Models	Runway ICCV	2023
Latent-Shift	Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation	Meta	2023
Dysen-VDM	Empowering Dynamics-aware Text-to-Video Diffusion with LLMs	NUS	2023

GAN & Autoregressive Era (2017–2022) 20+ papers

Model	Full Title	Venue	Year
Make-A-Video	Text-to-Video Generation without Text-Video Data	Meta ICLR	2023
Imagen Video	High Definition Video Generation with Diffusion Models	Google	2022
CogVideo	Large-scale Pretraining for Text-to-Video via Transformers	Tsinghua ICLR	2023
Phenaki	Variable Length Video Generation from Open Domain Textual Description	Google ICLR	2023
Video Diffusion Models	Foundational Video Diffusion Framework	Google	2022
MagicVideo	Efficient Video Generation With Latent Diffusion Models	ByteDance	2022
NUWA-XL	Diffusion over Diffusion for eXtremely Long Video Generation	Microsoft	2023
NUWA	Visual Synthesis Pre-training for Neural visUal World creAtion	Microsoft ECCV	2022
CogView2	Faster and Better Text-to-Image Generation via Hierarchical Transformers	NeurIPS	2022
GODIVA	Generating Open-DomaIn Videos from nAtural Descriptions	Microsoft	2021
Tune-A-Video	One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation	ICCV	2023
MM-Diffusion	Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation	CVPR	2023
Video Generation From Text	Pioneering text-to-video generation	AAAI	2018
Sync-DRAW	Automatic Video Generation using Deep Recurrent Attentive Architectures	ACM MM	2017
IRC-GAN	Introspective Recurrent Convolutional GAN for Text-to-Video Generation	IJCAI	2019

Commercial Products 10+ products

Product	Organization	Key Feature	Year
Sora	OpenAI	World simulator, up to 1 minute video	2024
Veo 2	Google DeepMind	High-definition, cinematic quality	2024
Kling	KuaiShou	Real-time, up to 2 minutes	2024
Gen 3 Alpha	Runway	Creative video generation and editing	2024
Dream Machine	Luma AI	Fast, high-quality video from text/image	2024
Wunjo CE	WR	Open-source video generation and editing	2024
Sora 2	OpenAI	Native audio, 20s clips, 1080p with Remix/Blend tools	2025
Veo 3	Google DeepMind	4K 60fps, native audio, 2-min HD video	2025
Kling 2.0	Kuaishou	Precise camera controls, 1080p, multi-shot	2025
Seedance	ByteDance	Scalable dance and motion video generation	2025