← World Models

Theory, Benchmarks & Surveys

Theoretical foundations, evaluation benchmarks, and comprehensive survey literature for world models research.

⌘K

Theory, Explainability & Position Papers

Emergent World Representations

Investigating when and how neural networks spontaneously learn structured, simulation-capable representations of their training environments — from Othello boards to spatial navigation grids.

Li et al., "General agents contain world models"; Gurnee & Tegmark, "Linear Spatial World Models Emerge in LLMs"

Causal Reasoning in Transformers

Evidence that next-token prediction yields genuine causal understanding — transformers trained on sequential data develop internal causal world models that support counterfactual reasoning.

Nichani et al., "Transformers Use Causal World Models in Maze-Solving Tasks"

Scaling Laws for World Models

Characterizing the compute-optimal strategies for pre-training agents and world models — how model capacity, data scale, and training compute interact to determine downstream performance.

"Scaling Laws for Pre-training Agents and World Models"

Video as the Universal Reasoning Substrate

The position that video generation — as the richest single modality — may serve as a universal language for real-world decision making, subsuming planning, prediction, and control.

"Video as the New Language for Real-World Decision Making"

Compositional Generative Modeling

The argument that no single monolithic model can capture the full distribution of reality — compositionality at the model level is necessary for robust, generalizable generation.

"Compositional Generative Modeling: A Single Model is Not All You Need"

Physics Cognition in Generation

Evaluating whether and how video generation models learn physically plausible dynamics — probing the gap between pixel-level realism and genuine physical understanding.

PhyWorld: "How Far is Video Generation from World Model: A Physical Law Perspective"

World Model Benchmarks 10 benchmarks

BenchmarkEvaluation FocusDomain
stable-worldmodel-v1Reproducible World Modeling Research and Evaluation World
WorldScoreUnified evaluation benchmark for world generation World
WorldSimBenchVideo generation models as world simulators World
PhyWorldPhysical law perspective evaluation of video generation World
NewtonInteractive foundation world model benchmark World
WorldGymEvaluating robot policies in a world model World
EWMBenchScene, motion, semantic quality in embodied WMs World
WorldLensFull-Spectrum Evaluations of Driving World Models in Real World Driving
VBenchComprehensive Evaluation for Video Generation Models Video
NAVSIMData-Driven Non-Reactive Autonomous Vehicle Simulation Driving

Workshops 10 workshops

WorkshopVenueDate
Workshop on 4D World Models: Bridging Generation and Reconstruction CVPR 2026TBD
The 2nd Workshop on World Models ICLR 2026Apr 2026
Workshop on World Modeling (MILA) MILAFeb 2026
Workshop on Embodied World Models for Decision MakingNeurIPS 2025 Dec 2025
Reliable and Interactable World Models ICCV 2025Oct 2025
Building Physically Plausible World Models ICML 2025Jul 2025
Assessing World Models ICML 2025Jul 2025
Benchmarking World Models CVPR 2025Jun 2025
World Models: Understanding, Modelling and Scaling ICLR 2025Apr 2025
Foundation Models for Autonomous Systems CVPR 2024Jun 2024

Driving Datasets 20+ datasets

DatasetDescriptionVenueYear
KITTIThe KITTI Vision Benchmark Suite for autonomous driving CVPR2012
nuScenesA Multimodal Dataset for Autonomous Driving CVPR2020
Waymo OpenScalability in Perception for Autonomous Driving CVPR2020
CARLAAn Open Urban Driving Simulator CoRL2017
SemanticKITTIA Dataset for Semantic Scene Understanding of LiDAR Sequences ICCV2019
Argoverse 2Next Generation Datasets for Self-Driving Perception and Forecasting NeurIPS2021
nuPlanA Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles CVPRW2021
KITTI-360Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D T-PAMI2022
OpenOccupancyLarge Scale Benchmark for Surrounding Semantic Occupancy Perception ICCV2023
Occ3D-nuScenesLarge-Scale 3D Occupancy Prediction Benchmark for AD NeurIPS2023
OpenDV-YouTubeGeneralized Predictive Model data for Autonomous Driving CVPR2024
SSCBenchLarge-Scale 3D Semantic Scene Completion Benchmark for AD IROS2024
NAVSIMData-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking NeurIPS2024
DrivingDojoInteractive and Knowledge-Enriched Driving World Model Dataset NeurIPS2024
EUVSExtrapolated Urban View Synthesis Benchmark ICCV2025

World Model Surveys & Literature 9 surveys

TitleDomainVenueYear
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond WorldarXiv2024
A Comprehensive Survey on World Models for Embodied AI EmbodiedarXiv2024
A Survey of World Models for Autonomous Driving DrivingarXiv2024
3D and 4D World Modeling: A Survey 3D/4DarXiv2025
Understanding World or Predicting Future? A Comprehensive Survey of World Models WorldarXiv2024
World Models: The Safety Perspective SafetyarXiv2024
Exploring the Evolution of Physics Cognition in Video Generation: A Survey PhysicsarXiv2024
From Masks to Worlds: A Hitchhiker's Guide to World Models WorldarXiv2024
A Survey: Learning Embodied Intelligence from Physical Simulators and World Models EmbodiedarXiv2024