Video Generation World Models

Multi-view street scene synthesis, action-conditioned video prediction, closed-loop 3D simulation, and 4D driving scene reconstruction.

⌘K

Video generation world models synthesize and predict driving scenes from layouts, actions, and multi-view inputs, enabling scalable synthetic data and closed-loop simulation for autonomous driving.

Data Engines — Multi-View Street Scene Synthesis 17 models

Model	Full Title	Venue	Year
BEVGen	Street-View Image Generation from a Bird's-Eye View Layout	RA-L	2024
MagicDrive	Street View Generation with Diverse 3D Geometry Control	ICLR	2024
Panacea	Panoramic and Controllable Video Generation for AD	CVPR	2024
DrivingDiffusion	Layout-Guided Multi-View Driving Scene Video Generation	ECCV	2024
WoVoGen	World Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation	ECCV	2024
SimGen	Simulator-Conditioned Driving Scene Generation	NeurIPS	2024
DiVE	DiT-Based Video Generation with Enhanced Control	arXiv	2024
DriveDreamer-2	LLM-Enhanced World Models for Diverse Driving Video Generation	AAAI	2025
Glad	A Streaming Scene Generator for Autonomous Driving	ICLR	2025
UniScene	Unified Occupancy-Centric Driving Scene Generation	CVPR	2025
DriveScape	High-Resolution Controllable Multi-View Driving Video Generation	CVPR	2025
MagicDrive-V2	High-Resolution Long Video Generation with Adaptive Control	ICCV	2025
PerLDiff	Controllable Street View Synthesis Using Perspective-Layout Diffusion	ICCV	2025
DINO-Foresight	Looking into the Future with DINO	NeurIPS	2025
Cosmos-Transfer1	Conditional World Generation with Adaptive Multimodal Control	arXiv	2025
CoGen	3D Consistent Video Generation via Adaptive Conditioning	arXiv	2025
STAGE	Stream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation	arXiv	2025

Action Interpreters — Action-Conditioned Video Prediction 16 models

Model	Full Title	Venue	Year
GAIA-1	A Generative World Model for Autonomous Driving	arXiv	2023
ADriver-I	A General World Model for Autonomous Driving	arXiv	2023
Drive-WM	Multiview Visual Forecasting and Planning with World Model	CVPR	2024
DriveDreamer	Towards Real-World-Driven World Models	ECCV	2024
GenAD	Generalized Predictive Model for Autonomous Driving	CVPR	2024
Vista	A Generalizable Driving World Model with High Fidelity	NeurIPS	2024
DrivingGPT	Unifying Driving World Modeling and Planning with Multi-Modal AR Transformers	arXiv	2024
DrivingWorld	Constructing World Model for AD via Video GPT	arXiv	2024
GEM	A Generalizable Ego-Vision Multimodal World Model	CVPR	2025
MaskGWM	A Generalizable Driving World Model with Video Mask Reconstruction	CVPR	2025
Epona	Autoregressive Diffusion World Model for Autonomous Driving	ICCV	2025
VaViM & VaVAM	Autonomous Driving through Video Generative Modeling	arXiv	2025
GAIA-2	A Controllable Multi-View Generative World Model	arXiv	2025
MiLA	Multi-View Intensive-Fidelity Long-Term Video Generation	arXiv	2025
ProphetDWM	A Driving World Model for Rolling Out Future Actions and Videos	arXiv	2025
LongDWM	Cross-Granularity Distillation for Building Long-Term Driving World Model	arXiv	2025

Neural Simulators — Closed-Loop 3D Simulation 11 models

Model	Full Title	Venue	Year
MagicDrive3D	Controllable 3D Generation for Any-View Rendering in Street Scenes	arXiv	2024
DreamForge	Motion-Aware Autoregressive Video Generation for Multi-View Driving Scenes	arXiv	2024
Doe-1	Closed-Loop Autonomous Driving with Large World Model	arXiv	2024
DrivingSphere	Building A High-Fidelity 4D World for Closed-Loop Simulation	CVPR	2025
UMGen	Generating Multimodal Driving Scenes via Next-Scene Prediction	CVPR	2025
DriveArena	A Closed-Loop Generative Simulation Platform for AD	ICCV	2025
InfiniCube	Unbounded and Controllable Dynamic 3D Driving Scene Generation	ICCV	2025
DiST-4D	Disentangled Spatiotemporal Diffusion with Metric Depth for 4D Scene Gen	ICCV	2025
Nexus	Decoupled Diffusion Sparks Adaptive Scene Generation	arXiv	2025
Cosmos-Drive	Scalable Synthetic Driving Data Generation with World Foundation Models	arXiv	2025
Challenger	Affordable Adversarial Driving Video Generation	arXiv	2025

Scene Reconstructors — 4D Driving Scene Reconstruction 19 models

Model	Full Title	Venue	Year
3DGS	3D Gaussian Splatting for Real-Time Radiance Field Rendering	TOG	2023
StreetGaussian	Modeling Dynamic Urban Scenes with Gaussian Splatting	ECCV	2024
4DGF	Dynamic 3D Gaussian Fields for Urban Areas	NeurIPS	2024
SCube	Instant Large-Scale Scene Reconstruction using VoxSplats	NeurIPS	2024
HUGS	Holistic Urban 3D Scene Understanding via Gaussian Splatting	CVPR	2024
OmniRe	Omni Urban Scene Reconstruction	ICLR	2025
DriveDreamer4D	World Models Are Effective Data Machines for 4D Driving Scene	CVPR	2025
DeSiRe-GS	4D Street Gaussians for Static-Dynamic Decomposition	CVPR	2025
SplatAD	Real-Time Lidar and Camera Rendering with 3DGS for AD	CVPR	2025
ReconDreamer	Crafting World Models for Driving Scene Reconstruction	CVPR	2025
StreetCrafter	Street View Synthesis with Controllable Video Diffusion	CVPR	2025
FlexDrive	Trajectory Flexibility in Driving Scene Reconstruction	CVPR	2025
InfiniCube	Unbounded Dynamic 3D Driving Scene Generation	ICCV	2025
DiST-4D	Disentangled Spatiotemporal Diffusion for 4D Scene Generation	ICCV	2025
DreamDrive	Generative 4D Scene Modeling from Street View Images	arXiv	2025
ReconDreamer++	Harmonizing Generative and Reconstructive Models	arXiv	2025
RealEngine	Simulating Autonomous Driving in Realistic Context	arXiv	2025
GeoDrive	3D Geometry-Informed Driving World Model with Precise Action Control	arXiv	2025
Diff4Splat	Controllable 4D Scene Generation with Latent Dynamic Reconstruction	arXiv	2025