Embodied Intelligence

World models for manipulation, navigation, locomotion, vision-language-action models, and model-based reinforcement learning.

⌘K

Model	Full Title	Domain
Scaling World Model	Scaling World Model for Hierarchical Manipulation Policies	Manipulation
Say, Dream, and Act	Say, Dream, and Act: Learning Video World Models for Instruction-Driven Robot Manipulation	Manipulation
World-VLA-Loop	World-VLA-Loop: Closed-Loop Learning of Video World Model and VLA Policy	Manipulation
π0	π0: A Vision-Language-Action Flow Model for General Robot Control	Foundation
Octo	Octo: An Open-Source Generalist Robot Policy	Foundation
OpenVLA	OpenVLA: An Open-Source Vision-Language-Action Model	Foundation
RDT-1B	RDT-1B: A Diffusion Foundation Model for Bimanual Manipulation	Manipulation
TesserAct	Learning 4D Embodied World Models	Foundation
DreamGen	Unlocking Generalization in Robot Learning through Video World Models	Foundation
iVideoGPT	Interactive VideoGPTs are Scalable World Models	Foundation
AgiBot-World	Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems	Manipulation
FLARE	Robot Learning with Implicit World Modeling	Manipulation
EnerVerse	Envisioning Embodied Future Space for Robotics Manipulation	Manipulation
NWM	Navigation World Models	Navigation
MindJourney	Test-Time Scaling with World Models for Spatial Reasoning	Navigation
DWL	Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning	Locomotion
Puppeteer	Hierarchical World Models as Visual Whole-Body Humanoid Controllers	Locomotion

Model	Full Title	Focus
3D-VLA	3D-VLA: A 3D Vision-Language-Action Generative World Model	VLA
SpatialVLA	SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model	VLA
Genie 2	Genie 2: A Large-Scale Foundation World Model	VLA
CoT-VLA	Visual Chain-of-Thought Reasoning for Vision-Language-Action Models	VLA
WorldVLA	Towards Autoregressive Action World Model	VLA
DreamVLA	A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge	VLA
Dreamer v4	Training Agents Inside of Scalable World Models	MBRL
Dreamer v3	Mastering Diverse Domains through World Models	MBRL
TD-MPC2	Scalable, Robust World Models for Continuous Control	MBRL
DINO-WM	World Models on Pre-trained Visual Features enable Zero-shot Planning	Latent
V-JEPA 2	Self-Supervised Video Models Enable Understanding, Prediction and Planning	JEPA