Theory, Benchmarks & Surveys

Theoretical foundations, evaluation benchmarks, and comprehensive survey literature for world models research.

⌘K

Theory, Explainability & Position Papers

Emergent World Representations

Investigating when and how neural networks spontaneously learn structured, simulation-capable representations of their training environments — from Othello boards to spatial navigation grids.

Li et al., "General agents contain world models"; Gurnee & Tegmark, "Linear Spatial World Models Emerge in LLMs"

Causal Reasoning in Transformers

Evidence that next-token prediction yields genuine causal understanding — transformers trained on sequential data develop internal causal world models that support counterfactual reasoning.

Nichani et al., "Transformers Use Causal World Models in Maze-Solving Tasks"

Scaling Laws for World Models

Characterizing the compute-optimal strategies for pre-training agents and world models — how model capacity, data scale, and training compute interact to determine downstream performance.

"Scaling Laws for Pre-training Agents and World Models"

Video as the Universal Reasoning Substrate

The position that video generation — as the richest single modality — may serve as a universal language for real-world decision making, subsuming planning, prediction, and control.

"Video as the New Language for Real-World Decision Making"

Compositional Generative Modeling

The argument that no single monolithic model can capture the full distribution of reality — compositionality at the model level is necessary for robust, generalizable generation.

"Compositional Generative Modeling: A Single Model is Not All You Need"

Physics Cognition in Generation

Evaluating whether and how video generation models learn physically plausible dynamics — probing the gap between pixel-level realism and genuine physical understanding.

PhyWorld: "How Far is Video Generation from World Model: A Physical Law Perspective"

World Model Benchmarks 10 benchmarks

Benchmark	Evaluation Focus	Domain
stable-worldmodel-v1	Reproducible World Modeling Research and Evaluation	World
WorldScore	Unified evaluation benchmark for world generation	World
WorldSimBench	Video generation models as world simulators	World
PhyWorld	Physical law perspective evaluation of video generation	World
Newton	Interactive foundation world model benchmark	World
WorldGym	Evaluating robot policies in a world model	World
EWMBench	Scene, motion, semantic quality in embodied WMs	World
WorldLens	Full-Spectrum Evaluations of Driving World Models in Real World	Driving
VBench	Comprehensive Evaluation for Video Generation Models	Video
NAVSIM	Data-Driven Non-Reactive Autonomous Vehicle Simulation	Driving

Workshops 10 workshops

Workshop	Venue	Date
Workshop on 4D World Models: Bridging Generation and Reconstruction	CVPR 2026	TBD
The 2nd Workshop on World Models	ICLR 2026	Apr 2026
Workshop on World Modeling (MILA)	MILA	Feb 2026
Workshop on Embodied World Models for Decision Making	NeurIPS 2025	Dec 2025
Reliable and Interactable World Models	ICCV 2025	Oct 2025
Building Physically Plausible World Models	ICML 2025	Jul 2025
Assessing World Models	ICML 2025	Jul 2025
Benchmarking World Models	CVPR 2025	Jun 2025
World Models: Understanding, Modelling and Scaling	ICLR 2025	Apr 2025
Foundation Models for Autonomous Systems	CVPR 2024	Jun 2024

Driving Datasets 20+ datasets

Dataset	Description	Venue	Year
KITTI	The KITTI Vision Benchmark Suite for autonomous driving	CVPR	2012
nuScenes	A Multimodal Dataset for Autonomous Driving	CVPR	2020
Waymo Open	Scalability in Perception for Autonomous Driving	CVPR	2020
CARLA	An Open Urban Driving Simulator	CoRL	2017
SemanticKITTI	A Dataset for Semantic Scene Understanding of LiDAR Sequences	ICCV	2019
Argoverse 2	Next Generation Datasets for Self-Driving Perception and Forecasting	NeurIPS	2021
nuPlan	A Closed-Loop ML-Based Planning Benchmark for Autonomous Vehicles	CVPRW	2021
KITTI-360	Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D	T-PAMI	2022
OpenOccupancy	Large Scale Benchmark for Surrounding Semantic Occupancy Perception	ICCV	2023
Occ3D-nuScenes	Large-Scale 3D Occupancy Prediction Benchmark for AD	NeurIPS	2023
OpenDV-YouTube	Generalized Predictive Model data for Autonomous Driving	CVPR	2024
SSCBench	Large-Scale 3D Semantic Scene Completion Benchmark for AD	IROS	2024
NAVSIM	Data-Driven Non-Reactive Autonomous Vehicle Simulation and Benchmarking	NeurIPS	2024
DrivingDojo	Interactive and Knowledge-Enriched Driving World Model Dataset	NeurIPS	2024
EUVS	Extrapolated Urban View Synthesis Benchmark	ICCV	2025

World Model Surveys & Literature 9 surveys

Title	Domain	Venue	Year
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond	World	arXiv	2024
A Comprehensive Survey on World Models for Embodied AI	Embodied	arXiv	2024
A Survey of World Models for Autonomous Driving	Driving	arXiv	2024
3D and 4D World Modeling: A Survey	3D/4D	arXiv	2025
Understanding World or Predicting Future? A Comprehensive Survey of World Models	World	arXiv	2024
World Models: The Safety Perspective	Safety	arXiv	2024
Exploring the Evolution of Physics Cognition in Video Generation: A Survey	Physics	arXiv	2024
From Masks to Worlds: A Hitchhiker's Guide to World Models	World	arXiv	2024
A Survey: Learning Embodied Intelligence from Physical Simulators and World Models	Embodied	arXiv	2024