Key Features

Synthesizes high-quality 3D scene videos
Streamlines 3D graphic design workflow
Leverages complementary strengths of image and video diffusion models
Generates high-quality, cross-view consistent anchor views
Faithfully interpolates intermediate frames
Enhanced by flow-based camera control and structural guidance
Operates without paired dataset of 3D scene models and natural images
Produces high-quality, style-consistent scene videos

To address this, VideoFrom3D proposes a generative framework that leverages the complementary strengths of image and video diffusion models. Specifically, the framework consists of a Sparse Anchor-view Generation (SAG) and a Geometry-guided Generative Inbetweening (GGI) module. The SAG module generates high-quality, cross-view consistent anchor views using an image diffusion model, aided by Sparse Appearance-guided Sampling. Building on these anchor views, GGI module faithfully interpolates intermediate frames using a video diffusion model, enhanced by flow-based camera control and structural guidance.


The synthesized video sequence shows consistent, high-quality visuals that reflect the input geometry and reference style, including challenging visual elements such as rising steam. Comprehensive experiments show that VideoFrom3D produces high-quality, style-consistent scene videos under diverse and challenging scenarios, outperforming simple and extended baselines. The framework operates without any paired dataset of 3D scene models and natural images, which is extremely difficult to obtain.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!