← World Models

Embodied Intelligence

World models for manipulation, navigation, locomotion, vision-language-action models, and model-based reinforcement learning.

⌘K

Embodied AI — Manipulation, Navigation & Locomotion 17 models

ModelFull TitleDomain
Scaling World ModelScaling World Model for Hierarchical Manipulation Policies Manipulation
Say, Dream, and ActSay, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation Manipulation
World-VLA-LoopWorld-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy Manipulation
π0π0: A Vision-Language-Action Flow Model for General Robot Control Foundation
OctoOcto: An Open-Source Generalist Robot Policy Foundation
OpenVLAOpenVLA: An Open-Source Vision-Language-Action Model Foundation
RDT-1BRDT-1B: A Diffusion Foundation Model for Bimanual Manipulation Manipulation
TesserActLearning 4D Embodied World Models Foundation
DreamGenUnlocking Generalization in Robot Learning through Video World Models Foundation
iVideoGPTInteractive VideoGPTs are Scalable World Models Foundation
AgiBot-WorldLarge-scale Manipulation Platform for Scalable and Intelligent Embodied Systems Manipulation
FLARERobot Learning with Implicit World Modeling Manipulation
EnerVerseEnvisioning Embodied Future Space for Robotics Manipulation Manipulation
NWMNavigation World Models Navigation
MindJourneyTest-Time Scaling with World Models for Spatial Reasoning Navigation
DWLAdvancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning Locomotion
PuppeteerHierarchical World Models as Visual Whole-Body Humanoid Controllers Locomotion

World Models × VLAs & Model-Based RL 11 models

ModelFull TitleFocus
3D-VLA3D-VLA: A 3D Vision-Language-Action Generative World Model VLA
SpatialVLASpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model VLA
Genie 2Genie 2: A Large-Scale Foundation World Model VLA
CoT-VLAVisual Chain-of-Thought Reasoning for Vision-Language-Action Models VLA
WorldVLATowards Autoregressive Action World Model VLA
DreamVLAA Vision-Language-Action Model Dreamed with Comprehensive World Knowledge VLA
Dreamer v4Training Agents Inside of Scalable World Models MBRL
Dreamer v3Mastering Diverse Domains through World Models MBRL
TD-MPC2Scalable, Robust World Models for Continuous Control MBRL
DINO-WMWorld Models on Pre-trained Visual Features enable Zero-shot Planning Latent
V-JEPA 2Self-Supervised Video Models Enable Understanding, Prediction and Planning JEPA