SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
Jianhong Bai, Menghan Xia, Xintao Wang, Ziyang Yuan, Xiao Fu, Zuozhu Liu, Haoji Hu, Pengfei Wan, Di Zhang
2024-12-12

Summary
This paper talks about SynCamMaster, a new method for generating synchronized videos from multiple camera angles, allowing for more realistic and dynamic video production.
What's the problem?
Creating videos that look consistent from different viewpoints can be challenging, especially when trying to maintain the appearance and movement of objects in 3D space. Existing methods often focus on single objects and struggle with generating open-world videos that can be viewed from various angles, making it hard to achieve a seamless viewing experience.
What's the solution?
The authors propose SynCamMaster, which enhances pre-trained text-to-video models by adding a special module that synchronizes video content across different camera angles. This method uses a hybrid training approach that combines multi-camera images and single-camera videos to improve the model's ability to produce high-quality footage. Additionally, SynCamMaster allows for re-rendering videos from new viewpoints, making it versatile for various applications like virtual filming.
Why it matters?
This research is important because it improves the way videos are generated, making them more realistic and engaging. By enabling synchronized video generation from multiple perspectives, SynCamMaster can be used in fields such as filmmaking, gaming, and virtual reality, enhancing the overall quality of visual storytelling.
Abstract
Recent advancements in video diffusion models have shown exceptional abilities in simulating real-world dynamics and maintaining 3D consistency. This progress inspires us to investigate the potential of these models to ensure dynamic consistency across various viewpoints, a highly desirable feature for applications such as virtual filming. Unlike existing methods focused on multi-view generation of single objects for 4D reconstruction, our interest lies in generating open-world videos from arbitrary viewpoints, incorporating 6 DoF camera poses. To achieve this, we propose a plug-and-play module that enhances a pre-trained text-to-video model for multi-camera video generation, ensuring consistent content across different viewpoints. Specifically, we introduce a multi-view synchronization module to maintain appearance and geometry consistency across these viewpoints. Given the scarcity of high-quality training data, we design a hybrid training scheme that leverages multi-camera images and monocular videos to supplement Unreal Engine-rendered multi-camera videos. Furthermore, our method enables intriguing extensions, such as re-rendering a video from novel viewpoints. We also release a multi-view synchronized video dataset, named SynCamVideo-Dataset. Project page: https://jianhongbai.github.io/SynCamMaster/.