4D Spatial Intelligence
A comprehensive survey spanning monocular depth estimation and camera pose recovery through dense 3D/4D tracking, dynamic scene reconstruction, human-centric motion capture, and physics-based simulation — encompassing the full landscape of reconstructing spatial intelligence from video.
Depth & Camera Pose
Monocular and video depth estimation, visual odometry, camera pose recovery, and unified depth–pose frameworks (DUSt3R, MonST3R, VGGT).
3D/4D Tracking
Dense 3D point tracking, scene flow estimation, unified depth–pose–tracking models, and long-range 4D correspondence methods.
3D Scene Reconstruction
Small-scale object/scene reconstruction, large-scale scene modeling, feed-forward 3D methods, neural surface reconstruction, and multi-view stereo.
4D Dynamic Scenes
General 4D scene reconstruction, deformable NeRFs, 4D Gaussian Splatting, dynamic novel view synthesis, and temporal decomposition methods.
Human-Centric 4D
SMPL-based mesh recovery, egocentric motion capture, appearance-rich human avatars, and human interaction modeling (HOI, HSI, HHI).
Physics-Based 4D
Physics-based character control, adversarial motion priors, physically plausible reconstruction, human simulation, and dynamic scene physics.