Key Features

Predicts dense 3D trajectories from monocular video inputs.
Uses predicted depth and camera information alongside RGB video.
Repurposes a pretrained Wan2.1-T2V-1.3B video diffusion transformer.
Runs dense trajectory prediction in a single forward pass.
Trains with DiT LoRA, I/O projections, and VAE adaptation stages.
Uses synthetic datasets including Kubric, Dynamic Replica, PointOdyssey, and TartanAir.
Provides official training code and model checkpoint instructions.
Targets 3D tracking, dynamic scene understanding, and robotics perception research.

The system builds on Wan2.1-T2V-1.3B as a pretrained video diffusion transformer and adapts it through training stages involving DiT LoRA, input and output projections, and VAE components. It trains on synthetic datasets such as Kubric, Dynamic Replica, PointOdyssey, and TartanAir, using rendered sequences and depth or camera supervision to learn dense 3D motion. This lets the model produce point trajectories and visibility estimates over time.


TrackCraft3R is useful for 3D scene understanding, robotics perception, dynamic reconstruction, augmented reality, and research on reusing generative video priors for geometric tasks. Its value is that a model originally designed for video generation can be converted into a dense tracker, showing that diffusion transformers encode useful motion and spatial structure. Because the submitted URL is a GitHub repository with official code, it is listed as free and open-source.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!