TrackCraft3R

NEW

Free 3D Open-Source

LikeWebsite Promote

Key Features

Predicts dense 3D trajectories from monocular video inputs.

Uses predicted depth and camera information alongside RGB video.

Repurposes a pretrained Wan2.1-T2V-1.3B video diffusion transformer.

Runs dense trajectory prediction in a single forward pass.

Trains with DiT LoRA, I/O projections, and VAE adaptation stages.

Uses synthetic datasets including Kubric, Dynamic Replica, PointOdyssey, and TartanAir.

Provides official training code and model checkpoint instructions.

Targets 3D tracking, dynamic scene understanding, and robotics perception research.

The system builds on Wan2.1-T2V-1.3B as a pretrained video diffusion transformer and adapts it through training stages involving DiT LoRA, input and output projections, and VAE components. It trains on synthetic datasets such as Kubric, Dynamic Replica, PointOdyssey, and TartanAir, using rendered sequences and depth or camera supervision to learn dense 3D motion. This lets the model produce point trajectories and visibility estimates over time.

TrackCraft3R is useful for 3D scene understanding, robotics perception, dynamic reconstruction, augmented reality, and research on reusing generative video priors for geometric tasks. Its value is that a model originally designed for video generation can be converted into a dense tracker, showing that diffusion transformers encode useful motion and spatial structure. Because the submitted URL is a GitHub repository with official code, it is listed as free and open-source.

Get more likes & reach the top of search results by adding this button on your site!

TrackCraft3R

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter