Generative View Stitching

Chonghyuk Song, Michal Stary, Boyuan Chen, George Kopanas, Vincent Sitzmann

2025-10-30

Summary

This paper introduces a new method called Generative View Stitching (GVS) for creating videos where the camera follows a specific path. It tackles the issue of videos falling apart or having objects collide with the scene when using current video generation technology.

What's the problem?

Existing video generation models, specifically those that build videos step-by-step, struggle when you want the video to follow a pre-planned camera route. Because they generate each frame based only on what came before, they can't 'see' future camera positions and therefore create scenes that the camera will eventually crash into. This causes the video generation to become unstable and unrealistic, especially for longer or more complex camera movements.

What's the solution?

The researchers developed GVS, which generates all the frames of the video simultaneously, instead of one at a time. This allows the model to consider the entire camera path when creating the scene, preventing collisions. They also created a technique called Omni Guidance, which helps ensure the video flows smoothly and consistently by looking at both past and future frames during generation. Importantly, GVS works with existing video generation models without needing to retrain them.

Why it matters?

This work is important because it allows for the creation of high-quality, realistic videos with precise camera control. This has applications in areas like filmmaking, virtual reality, and robotics, where controlling the camera's perspective is crucial. It overcomes a major limitation of current video generation technology, opening up possibilities for more complex and visually appealing video content.

Abstract

Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the future. In camera-guided video generation with a predefined camera trajectory, this limitation leads to collisions with the generated scene, after which autoregression quickly collapses. To address this, we propose Generative View Stitching (GVS), which samples the entire sequence in parallel such that the generated scene is faithful to every part of the predefined camera trajectory. Our main contribution is a sampling algorithm that extends prior work on diffusion stitching for robot planning to video generation. While such stitching methods usually require a specially trained model, GVS is compatible with any off-the-shelf video model trained with Diffusion Forcing, a prevalent sequence diffusion framework that we show already provides the affordances necessary for stitching. We then introduce Omni Guidance, a technique that enhances the temporal consistency in stitching by conditioning on both the past and future, and that enables our proposed loop-closing mechanism for delivering long-range coherence. Overall, GVS achieves camera-guided video generation that is stable, collision-free, frame-to-frame consistent, and closes loops for a variety of predefined camera paths, including Oscar Reutersv\"ard's Impossible Staircase. Results are best viewed as videos at https://andrewsonga.github.io/gvs.

View Paper