← World Models

Video Generation World Models

Multi-view street scene synthesis, action-conditioned video prediction, closed-loop 3D simulation, and 4D driving scene reconstruction.

⌘K

Video generation world models synthesize and predict driving scenes from layouts, actions, and multi-view inputs, enabling scalable synthetic data and closed-loop simulation for autonomous driving.

Data Engines — Multi-View Street Scene Synthesis 17 models

ModelFull TitleVenueYear
BEVGenStreet-View Image Generation from a Bird's-Eye View Layout RA-L2024
MagicDriveStreet View Generation with Diverse 3D Geometry Control ICLR2024
PanaceaPanoramic and Controllable Video Generation for AD CVPR2024
DrivingDiffusionLayout-Guided Multi-View Driving Scene Video Generation ECCV2024
WoVoGenWorld Volume-Aware Diffusion for Controllable Multi-Camera Driving Scene Generation ECCV2024
SimGenSimulator-Conditioned Driving Scene Generation NeurIPS2024
DiVEDiT-Based Video Generation with Enhanced Control arXiv2024
DriveDreamer-2LLM-Enhanced World Models for Diverse Driving Video Generation AAAI2025
GladA Streaming Scene Generator for Autonomous Driving ICLR2025
UniSceneUnified Occupancy-Centric Driving Scene Generation CVPR2025
DriveScapeHigh-Resolution Controllable Multi-View Driving Video Generation CVPR2025
MagicDrive-V2High-Resolution Long Video Generation with Adaptive Control ICCV2025
PerLDiffControllable Street View Synthesis Using Perspective-Layout Diffusion ICCV2025
DINO-ForesightLooking into the Future with DINO NeurIPS2025
Cosmos-Transfer1Conditional World Generation with Adaptive Multimodal Control arXiv2025
CoGen3D Consistent Video Generation via Adaptive Conditioning arXiv2025
STAGEStream-Centric Generative World Model for Long-Horizon Driving-Scene Simulation arXiv2025

Action Interpreters — Action-Conditioned Video Prediction 16 models

ModelFull TitleVenueYear
GAIA-1A Generative World Model for Autonomous Driving arXiv2023
ADriver-IA General World Model for Autonomous Driving arXiv2023
Drive-WMMultiview Visual Forecasting and Planning with World Model CVPR2024
DriveDreamerTowards Real-World-Driven World Models ECCV2024
GenADGeneralized Predictive Model for Autonomous Driving CVPR2024
VistaA Generalizable Driving World Model with High Fidelity NeurIPS2024
DrivingGPTUnifying Driving World Modeling and Planning with Multi-Modal AR Transformers arXiv2024
DrivingWorldConstructing World Model for AD via Video GPT arXiv2024
GEMA Generalizable Ego-Vision Multimodal World Model CVPR2025
MaskGWMA Generalizable Driving World Model with Video Mask Reconstruction CVPR2025
EponaAutoregressive Diffusion World Model for Autonomous Driving ICCV2025
VaViM & VaVAMAutonomous Driving through Video Generative Modeling arXiv2025
GAIA-2A Controllable Multi-View Generative World Model arXiv2025
MiLAMulti-View Intensive-Fidelity Long-Term Video Generation arXiv2025
ProphetDWMA Driving World Model for Rolling Out Future Actions and Videos arXiv2025
LongDWMCross-Granularity Distillation for Building Long-Term Driving World Model arXiv2025

Neural Simulators — Closed-Loop 3D Simulation 11 models

ModelFull TitleVenueYear
MagicDrive3DControllable 3D Generation for Any-View Rendering in Street Scenes arXiv2024
DreamForgeMotion-Aware Autoregressive Video Generation for Multi-View Driving Scenes arXiv2024
Doe-1Closed-Loop Autonomous Driving with Large World Model arXiv2024
DrivingSphereBuilding A High-Fidelity 4D World for Closed-Loop Simulation CVPR2025
UMGenGenerating Multimodal Driving Scenes via Next-Scene Prediction CVPR2025
DriveArenaA Closed-Loop Generative Simulation Platform for AD ICCV2025
InfiniCubeUnbounded and Controllable Dynamic 3D Driving Scene Generation ICCV2025
DiST-4DDisentangled Spatiotemporal Diffusion with Metric Depth for 4D Scene Gen ICCV2025
NexusDecoupled Diffusion Sparks Adaptive Scene Generation arXiv2025
Cosmos-DriveScalable Synthetic Driving Data Generation with World Foundation Models arXiv2025
ChallengerAffordable Adversarial Driving Video Generation arXiv2025

Scene Reconstructors — 4D Driving Scene Reconstruction 19 models

ModelFull TitleVenueYear
3DGS3D Gaussian Splatting for Real-Time Radiance Field Rendering TOG2023
StreetGaussianModeling Dynamic Urban Scenes with Gaussian Splatting ECCV2024
4DGFDynamic 3D Gaussian Fields for Urban Areas NeurIPS2024
SCubeInstant Large-Scale Scene Reconstruction using VoxSplats NeurIPS2024
HUGSHolistic Urban 3D Scene Understanding via Gaussian Splatting CVPR2024
OmniReOmni Urban Scene Reconstruction ICLR2025
DriveDreamer4DWorld Models Are Effective Data Machines for 4D Driving Scene CVPR2025
DeSiRe-GS4D Street Gaussians for Static-Dynamic Decomposition CVPR2025
SplatADReal-Time Lidar and Camera Rendering with 3DGS for AD CVPR2025
ReconDreamerCrafting World Models for Driving Scene Reconstruction CVPR2025
StreetCrafterStreet View Synthesis with Controllable Video Diffusion CVPR2025
FlexDriveTrajectory Flexibility in Driving Scene Reconstruction CVPR2025
InfiniCubeUnbounded Dynamic 3D Driving Scene Generation ICCV2025
DiST-4DDisentangled Spatiotemporal Diffusion for 4D Scene Generation ICCV2025
DreamDriveGenerative 4D Scene Modeling from Street View Images arXiv2025
ReconDreamer++Harmonizing Generative and Reconstructive Models arXiv2025
RealEngineSimulating Autonomous Driving in Realistic Context arXiv2025
GeoDrive3D Geometry-Informed Driving World Model with Precise Action Control arXiv2025
Diff4SplatControllable 4D Scene Generation with Latent Dynamic Reconstruction arXiv2025