ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation
Yaokun Li, Shuaixian Wang, Mantang Guo, Jiehui Huang, Taojun Ding, Mu Hu, Kaixuan Wang, Shaojie Shen, Guang Tan
2025-12-09
Summary
This paper introduces a new system called ReCamDriving that creates realistic videos of driving scenes, and importantly, lets you control the camera's viewpoint within those scenes.
What's the problem?
Existing methods for generating driving videos have limitations. Some try to 'repair' existing footage but struggle with complex details, while others use LiDAR data which doesn't capture the full scene. These approaches often result in videos that aren't very realistic or don't allow for precise camera control, and they can struggle to generalize to new situations.
What's the solution?
ReCamDriving uses detailed 3D scene representations to guide the video generation process, giving it a much better understanding of the environment's geometry. It's trained in two steps: first, it learns to control the camera generally, and then it uses the 3D information to refine the viewpoint and make the scene look accurate. To make sure the system learns to handle different camera movements, the researchers also created a large dataset of parallel driving videos, called ParaDrive, with over 110,000 pairs of videos showing the same scene from different camera angles.
Why it matters?
This work is significant because it allows for the creation of high-quality, camera-controllable driving videos using only camera footage. This is useful for applications like training self-driving cars in simulated environments, creating realistic visual effects for movies, and developing new tools for virtual reality and augmented reality experiences. The new dataset, ParaDrive, also provides a valuable resource for other researchers working in this field.
Abstract
We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.