TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models

Mark YU, Wenbo Hu, Jinbo Xing, Ying Shan

2025-03-10

TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
via Diffusion Models

Summary

This paper talks about TrajectoryCrafter, a new AI tool that lets users change the camera's movement in videos shot with a single camera, creating smooth and realistic new perspectives

What's the problem?

Existing methods for editing camera movements in videos struggle with balancing two things: keeping the background consistent while also generating realistic new content in the foreground. They also rely on rare multi-camera video data, which limits their effectiveness

What's the solution?

The researchers developed TrajectoryCrafter, which uses a dual-stream video diffusion model to handle both the background and the new camera movements separately. They created a special training method using a mix of regular single-camera videos from the internet and some multi-camera data. This approach allows the system to generate high-quality videos with precise camera control, even for scenes it hasn't seen before

Why it matters?

This matters because it makes it easier to edit videos creatively, like changing how a scene is filmed after it's already been recorded. It could be useful for filmmakers, content creators, and even virtual reality developers by making video editing more flexible and accessible without needing expensive multi-camera setups

Abstract

We present TrajectoryCrafter, a novel approach to redirect camera trajectories for monocular videos. By disentangling deterministic view transformations from stochastic content generation, our method achieves precise control over user-specified camera trajectories. We propose a novel dual-stream conditional video diffusion model that concurrently integrates point cloud renders and source videos as conditions, ensuring accurate view transformations and coherent 4D content generation. Instead of leveraging scarce multi-view videos, we curate a hybrid training dataset combining web-scale monocular videos with static multi-view datasets, by our innovative double-reprojection strategy, significantly fostering robust generalization across diverse scenes. Extensive evaluations on multi-view and large-scale monocular videos demonstrate the superior performance of our method.

View Paper