SpatialTrackerV2

NEW

Paid Tracking 3D Vision

LikeWebsite Promote

Key Features

Unified, end-to-end 3D point tracking model

Estimates camera motion, consistent geometry, and pixel-wise 3D trajectories

Fully differentiable architecture

Scalable training across diverse data sources

Jointly learns geometry and motion

Outperforms prior 3D tracking methods

Delivers strong results in 2D tracking and dynamic 3D reconstruction

Fast inference time (10-20 seconds per sequence)

SpatialTrackerV2 achieves significant improvements by jointly learning geometry and motion, outperforming all prior 3D tracking methods by a clear margin. Additionally, it delivers strong results in 2D tracking and dynamic 3D reconstruction. The model consists of two main components: a VGGT-style network that extracts high-level semantic features from the input video to initialize consistent scene geometry and camera motion, and a track refiner that iteratively updates all 4D attributes, including 2D and 3D point tracking, trajectory-wise dynamic probabilities, and camera poses.

SpatialTrackerV2 presents qualitative results across diverse scenarios, with all results generated by the model in a purely feed-forward manner, taking only 10-20 seconds per sequence. The model's ability to estimate camera motion, consistent geometry, and pixel-wise 3D trajectories at once makes it a powerful tool for various applications. With its scalable training and strong performance, SpatialTrackerV2 has the potential to advance the field of 3D point tracking and related areas.

Get more likes & reach the top of search results by adding this button on your site!

SpatialTrackerV2

Key Features

Subscribe to the AI Search Newsletter