WorldGrow: Generating Infinite 3D World
Sikuang Li, Chen Yang, Jiemin Fang, Taoran Yi, Jia Lu, Jiazhong Cen, Lingxi Xie, Wei Shen, Qi Tian
2025-10-27
Summary
This paper introduces a new method called WorldGrow for creating incredibly large and detailed 3D worlds, like those you'd find in video games or virtual reality, that can extend infinitely.
What's the problem?
Currently, making these large 3D worlds is really hard. Some methods try to build them from 2D images, but those often look inconsistent from different angles. Others use complex 3D models, but those can't easily scale up to create huge environments. Existing 3D models are also usually focused on individual objects, not entire scenes, making it difficult to build a cohesive world.
What's the solution?
WorldGrow solves this by cleverly using pre-trained 3D models as a starting point. It breaks down the world into reusable 'scene blocks' and learns how to arrange them in a realistic way. It first carefully selects high-quality blocks for training, then uses a process to fill in gaps and extend the scene based on the surrounding context. Finally, it builds the world in stages, starting with a rough layout and then adding finer details to make it look realistic and structurally sound.
Why it matters?
This research is important because it allows for the creation of much larger and more realistic virtual environments than previously possible. This has big implications for things like video game development, virtual reality experiences, and even building detailed simulations of the real world, potentially leading to better 'world models' for AI.
Abstract
We tackle the challenge of generating the infinitely extendable 3D world -- large, continuous environments with coherent geometry and realistic appearance. Existing methods face key challenges: 2D-lifting approaches suffer from geometric and appearance inconsistencies across views, 3D implicit representations are hard to scale up, and current 3D foundation models are mostly object-centric, limiting their applicability to scene-level generation. Our key insight is leveraging strong generation priors from pre-trained 3D models for structured scene block generation. To this end, we propose WorldGrow, a hierarchical framework for unbounded 3D scene synthesis. Our method features three core components: (1) a data curation pipeline that extracts high-quality scene blocks for training, making the 3D structured latent representations suitable for scene generation; (2) a 3D block inpainting mechanism that enables context-aware scene extension; and (3) a coarse-to-fine generation strategy that ensures both global layout plausibility and local geometric/textural fidelity. Evaluated on the large-scale 3D-FRONT dataset, WorldGrow achieves SOTA performance in geometry reconstruction, while uniquely supporting infinite scene generation with photorealistic and structurally consistent outputs. These results highlight its capability for constructing large-scale virtual environments and potential for building future world models.