Key Features

Predicts spatially consistent geometry from any visual inputs
Recovers visual space from any number of views
Improves SLAM performance
Reduces drift in large-scale environments
Estimates stable and fusible depth maps
Enhances autonomous vehicles' environmental understanding
Achieves strong and generalizable novel view synthesis capability
Sets a new state-of-the-art across all tasks

DA3 recovers the visual space from any number of views, covering from single view to multiple views. This demo illustrates the ability of DA3 to recover the visual space from a difficult video. Accurate visual geometry estimation improves SLAM performance. Quantitative results show that simply replacing VGGT in VGGT-Long with DA3 (DA3-Long) significantly reduces drift in large-scale environments, even better than COLMAP, which takes more 48 hours to complete.


DA3 estimates stable and fusible depth maps, enhancing autonomous vehicles' environmental understanding. By freezing the entire backbone and training a DPT head to predict 3DGS parameters, our model achieves very strong and generalizable novel view synthesis capability. DA3 sets a new state-of-the-art across all tasks, surpassing prior SOTA VGGT by an average of 35.7% in camera pose accuracy and 23.6% in geometric accuracy. Moreover, it outperforms DA2 in monocular depth estimation.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!