DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Jinxiu Liu, Shaoheng Lin, Yinxiao Li, Ming-Hsuan Yang

2024-12-17

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Summary

This paper introduces DynamicScaler, a new method for creating high-quality panoramic videos from single videos, making it easier to generate immersive scenes for virtual and augmented reality applications.

What's the problem?

Generating panoramic videos that look good and maintain quality is challenging because most existing methods struggle with limited resolution and aspect ratios. This makes it hard to create dynamic content that feels realistic and coherent, especially when using just one camera view. As a result, many video generation techniques can't keep up with the growing demand for immersive experiences in AR (augmented reality) and VR (virtual reality).

What's the solution?

DynamicScaler addresses these issues by using a unique technique called Offset Shifting Denoiser, which helps create seamless transitions in panoramic scenes. It allows the model to efficiently process and enhance video quality while maintaining consistency across the entire scene. Additionally, a Global Motion Guidance system ensures that movements in the video are smooth and realistic, even when the scene changes. This method can handle various resolutions and aspect ratios without needing extensive training.

Why it matters?

This research is significant because it enables faster and more effective video generation for panoramic scenes, which is essential for applications in gaming, virtual tours, and other immersive experiences. By improving how we create dynamic videos from single sources, DynamicScaler could lead to more engaging and realistic content in the growing fields of AR and VR.

Abstract

The increasing demand for immersive AR/VR applications and spatial intelligence has heightened the need to generate high-quality scene-level and 360{\deg} panoramic video. However, most video diffusion models are constrained by limited resolution and aspect ratio, which restricts their applicability to scene-level dynamic content synthesis. In this work, we propose the DynamicScaler, addressing these challenges by enabling spatially scalable and panoramic dynamic scene synthesis that preserves coherence across panoramic scenes of arbitrary size. Specifically, we introduce a Offset Shifting Denoiser, facilitating efficient, synchronous, and coherent denoising panoramic dynamic scenes via a diffusion model with fixed resolution through a seamless rotating Window, which ensures seamless boundary transitions and consistency across the entire panoramic space, accommodating varying resolutions and aspect ratios. Additionally, we employ a Global Motion Guidance mechanism to ensure both local detail fidelity and global motion continuity. Extensive experiments demonstrate our method achieves superior content and motion quality in panoramic scene-level video generation, offering a training-free, efficient, and scalable solution for immersive dynamic scene creation with constant VRAM consumption regardless of the output video resolution. Our project page is available at https://dynamic-scaler.pages.dev/.

View Paper