< Explain other AI papers

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Rundi Wu, Ruiqi Gao, Ben Poole, Alex Trevithick, Changxi Zheng, Jonathan T. Barron, Aleksander Holynski

2024-11-28

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models

Summary

This paper introduces CAT4D, a new method for creating dynamic 3D scenes from regular videos. It allows users to generate 4D visuals that change over time using a single video as input.

What's the problem?

Creating realistic 4D scenes from videos is challenging because most methods require multiple camera angles or complex setups. Existing techniques often struggle to generate consistent and high-quality images when using just one video, which limits their effectiveness.

What's the solution?

The authors developed CAT4D, which uses a special model called a multi-view video diffusion model. This model can take a single video and transform it into multiple views, allowing for the creation of dynamic 3D scenes. They also introduced a new sampling approach that helps optimize the visual quality of the generated scenes, making them look more realistic and coherent.

Why it matters?

This research is important because it pushes the boundaries of how we can use simple video inputs to create complex visual experiences. By enabling the generation of high-quality 4D scenes from just one video, CAT4D has the potential to enhance fields like gaming, virtual reality, and film production, making it easier and more efficient to create engaging content.

Abstract

We present CAT4D, a method for creating 4D (dynamic 3D) scenes from monocular video. CAT4D leverages a multi-view video diffusion model trained on a diverse combination of datasets to enable novel view synthesis at any specified camera poses and timestamps. Combined with a novel sampling approach, this model can transform a single monocular video into a multi-view video, enabling robust 4D reconstruction via optimization of a deformable 3D Gaussian representation. We demonstrate competitive performance on novel view synthesis and dynamic scene reconstruction benchmarks, and highlight the creative capabilities for 4D scene generation from real or generated videos. See our project page for results and interactive demos: cat-4d.github.io.