← 4D Vision

Depth & Camera Pose

Monocular and video depth estimation, visual odometry, camera pose recovery, and unified depth–pose frameworks — from classical SLAM to modern learning-based methods.

⌘K

Video Depth Estimation 25+ papers

Model / MethodFull TitleVenueYear
GeoNetUnsupervised learning of dense depth, optical flow and camera pose CVPR2018
SC-SfMLearnerUnsupervised scale-consistent depth and ego-motion learning NeurIPS2019
DeepV2DVideo to Depth with Differentiable Structure from Motion ICLR2020
Consistent Video DepthConsistent Video Depth Estimation SIGGRAPH2020
ManyDepthSelf-supervised multi-frame monocular depth CVPR2021
SimpleReconSimpleRecon: 3D Reconstruction Without 3D Convolutions ECCV2021
DepthFormerDepthFormer: Multimodal Positional Encodings for Depth Prediction CVPR2022
MonoViTMonoViT: Self-Supervised Monocular Depth Estimation with a Vision Transformer 3DV2022
NVDSNeural Video Depth Stabilizer ICCV2023
MAMoMAMo: Memory-Augmented Monocular Depth Estimation ICCV2023
FutureDepthFutureDepth: Learning to Predict Depth from Future Frames ECCV2024
NVDS+Neural Video Depth Stabilizer (Extended) T-PAMI2024
DepthAnyVideoDepth Any Video with Scalable Synthetic Data ICLR2025
DepthCrafterGenerating Consistent Long Depth Sequences CVPR2025
ChronoDepthLearning Temporally Consistent Video Depth from Video Diffusion Priors CVPR2025
Video Depth AnythingConsistent Depth Estimation for Super-Long Videos CVPR2025
Depth Anything 3Recovering the Visual Space from Any Views arXiv2025

Camera Pose Estimation & Visual Odometry 25+ papers

Model / MethodFull TitleVenueYear
LSD-SLAMLarge-Scale Direct Monocular SLAM ECCV2014
ORB-SLAMORB-SLAM: A Versatile and Accurate Monocular SLAM System TRO2015
DSODirect Sparse Odometry T-PAMI2017
ORB-SLAM2ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras TRO2017
DeepVOTowards End-to-End Visual Odometry with Deep Recurrent Convolutional Neural Networks ICRA2017
D3VODeep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry CVPR2020
TartanVOTartanVO: A Generalizable Learning-based VO CoRL2021
ParticleSfMParticleSfM: Exploiting Dense Point Trajectories for Localizing Moving Cameras in the Wild ECCV2022
DPVODPVO: Deep Patch Visual Odometry NeurIPS2023
DPV-SLAMDPV-SLAM: Direct Patch Visual SLAM ECCV2024
LEAP-VOLong-term Effective Any Point Tracking for Visual Odometry CVPR2024
VGGSfMVisual Geometry Grounded Deep SfM CVPR2024
RLVOReinforcement Learning Meets Visual Odometry ECCV2024
AnyCamLearning to Recover Camera Poses and Intrinsics from Casual Videos CVPR2025
DynPoseDynamic Camera Poses and Where to Find Them CVPR2025
AirSLAMEfficient and Illumination-Robust Point-Line SLAM T-RO2025
PuffinThinking with Camera: A Unified Multimodal Model arXiv2025

Unified Depth & Pose Estimation 30+ papers

Model / MethodFull TitleVenueYear
Robust-CVDRobust consistent video depth estimation CVPR2021
Spann3R3D reconstruction with spatial memory 3DV2025
MonST3REstimating Geometry in the Presence of Motion ICLR2025
Align3RAligned Monocular Depth Estimation for Dynamic Videos CVPR2025
CUT3RContinuous 3D Perception Model with Persistent State CVPR2025
Easi3REstimating Disentangled Motion from DUSt3R Without Training ICCV2025
GeometryCrafterConsistent geometry estimation for open-world videos ICCV2025
AetherGeometric-aware unified world modeling ICCV2025
Geo4DLeveraging video generators for geometric 4D scene reconstruction ICCV2025
UniGeoTaming Video Diffusion for Unified Consistent Geometry Estimation arXiv2025
Point3RStreaming 3D Reconstruction with Explicit Spatial Pointer Memory arXiv2025
StreamVGGTStreaming 4D Visual Geometry Transformer arXiv2025
STream3RScalable Sequential 3D Reconstruction with Causal Transformer arXiv2025
ViPEVideo Pose Engine for 3D Geometric Perception arXiv2025
MASt3R-FusionIntegrating Feed-Forward Visual Model with IMU, GNSS arXiv2025
WinT3RWindow-Based Streaming Reconstruction with Camera Token Pool arXiv2025
OmniVGGTOmni-Modality Driven Visual Geometry Grounded Transformer arXiv2025
LiteVGGTBoosting Vanilla VGGT via Geometry-aware Cached Token Merging arXiv2025
MegaSamMegaSam: Accurate, Fast, and Robust Structure and Motion from Casual Dynamic Videos CVPR2025
π³π³: Scalable Permutation-Equivariant Visual Geometry Learning arXiv2025
MUT3RMUT3R: Motion-aware Updating Transformer for Dynamic 3D Reconstruction arXiv2025
AVGGTAVGGT: Rethinking Global Attention for Accelerating VGGT arXiv2025
TUN3DTUN3D: Towards Real-World Scene Understanding from Unposed Images arXiv2025