WonderZoom: Multi-Scale 3D World Generation
Jin Cao, Hong-Xing Yu, Jiajun Wu
2025-12-11
Summary
This paper introduces WonderZoom, a new computer graphics technique that creates detailed 3D scenes from just a single 2D image, and allows you to zoom into those scenes to reveal more and more detail.
What's the problem?
Current methods for generating 3D worlds from images are limited because they can only create details at one level of size. Imagine trying to build a landscape with mountains *and* tiny flowers – existing systems struggle to do both convincingly in the same scene. The core issue is that there wasn't a way to represent 3D information that could handle objects of vastly different sizes and still look realistic.
What's the solution?
WonderZoom solves this by using two main ideas. First, it uses something called 'scale-adaptive Gaussian surfels,' which are like tiny 3D building blocks that can change size to represent different levels of detail. Second, it has a 'progressive detail synthesizer' that gradually adds finer and finer details as you 'zoom in' on a specific area of the image, almost like it's creating the details on the fly. This allows the system to automatically fill in details that weren't originally visible in the image.
Why it matters?
This is important because it makes creating complex 3D environments much easier. Instead of needing lots of images or manual work, you can start with a single picture and generate a fully explorable 3D world with details ranging from large structures to microscopic features. This could be useful for things like video game development, virtual reality, or even scientific visualization.
Abstract
We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single-scale synthesis and cannot produce coherent scene contents at varying granularities. The fundamental challenge is the lack of a scale-aware 3D representation capable of generating and rendering content with largely different spatial sizes. WonderZoom addresses this through two key innovations: (1) scale-adaptive Gaussian surfels for generating and real-time rendering of multi-scale 3D scenes, and (2) a progressive detail synthesizer that iteratively generates finer-scale 3D contents. Our approach enables users to "zoom into" a 3D region and auto-regressively synthesize previously non-existent fine details from landscapes to microscopic features. Experiments demonstrate that WonderZoom significantly outperforms state-of-the-art video and 3D models in both quality and alignment, enabling multi-scale 3D world creation from a single image. We show video results and an interactive viewer of generated multi-scale 3D worlds in https://wonderzoom.github.io/