WorldGen: From Text to Traversable and Interactive 3D Worlds

Dilin Wang, Hyunyoung Jung, Tom Monnier, Kihyuk Sohn, Chuhang Zou, Xiaoyu Xiang, Yu-Ying Yeh, Di Liu, Zixuan Huang, Thu Nguyen-Phuoc, Yuchen Fan, Sergiu Oprea, Ziyan Wang, Roman Shapovalov, Nikolaos Sarafianos, Thibault Groueix, Antoine Toisoul, Prithviraj Dhar, Xiao Chu, Minghao Chen, Geon Yeong Park, Mahima Gupta

2025-11-24

WorldGen: From Text to Traversable and Interactive 3D Worlds

Summary

This paper introduces WorldGen, a new system that automatically builds large, interactive 3D worlds based on simple text descriptions you give it.

What's the problem?

Creating detailed 3D worlds for things like video games or virtual reality is usually a really slow and difficult process, requiring specialized skills in 3D modeling and design. It's hard for people without those skills to easily bring their ideas to life in a virtual space.

What's the solution?

WorldGen solves this by combining several AI technologies. It uses large language models to understand what you want the world to be like from your text, then uses procedural generation and diffusion models to actually *create* the 3D environment. It breaks down the scene into objects and makes sure everything fits together logically and looks good, all without needing someone to manually build everything piece by piece. You can also control details like the size and style of the world.

Why it matters?

This is a big step towards making 3D world creation much more accessible. Instead of needing to be a 3D artist, anyone could potentially design their own virtual worlds just by typing a description, which could be useful for game development, simulations, or even creating spaces for people to interact in online.

Abstract

We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts. Our approach transforms natural language descriptions into traversable, fully textured environments that can be immediately explored or edited within standard game engines. By combining LLM-driven scene layout reasoning, procedural generation, diffusion-based 3D generation, and object-aware scene decomposition, WorldGen bridges the gap between creative intent and functional virtual spaces, allowing creators to design coherent, navigable worlds without manual modeling or specialized 3D expertise. The system is fully modular and supports fine-grained control over layout, scale, and style, producing worlds that are geometrically consistent, visually rich, and efficient to render in real time. This work represents a step towards accessible, generative world-building at scale, advancing the frontier of 3D generative AI for applications in gaming, simulation, and immersive social environments.

View Paper