Depth Anything 3

Free Vision 3D Modeling

LikeWebsite Promote

Key Features

Predicts spatially consistent geometry from any visual inputs

Recovers visual space from any number of views

Improves SLAM performance

Reduces drift in large-scale environments

Estimates stable and fusible depth maps

Enhances autonomous vehicles' environmental understanding

Achieves strong and generalizable novel view synthesis capability

Sets a new state-of-the-art across all tasks

DA3 recovers the visual space from any number of views, covering from single view to multiple views. This demo illustrates the ability of DA3 to recover the visual space from a difficult video. Accurate visual geometry estimation improves SLAM performance. Quantitative results show that simply replacing VGGT in VGGT-Long with DA3 (DA3-Long) significantly reduces drift in large-scale environments, even better than COLMAP, which takes more 48 hours to complete.

DA3 estimates stable and fusible depth maps, enhancing autonomous vehicles' environmental understanding. By freezing the entire backbone and training a DPT head to predict 3DGS parameters, our model achieves very strong and generalizable novel view synthesis capability. DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation.

Get more likes & reach the top of search results by adding this button on your site!

Depth Anything 3

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter