Yo'City: Personalized and Boundless 3D Realistic City Scene Generation via Self-Critic Expansion
Keyang Lu, Sifan Zhou, Hongbin Xu, Gang Xu, Zhifei Yang, Yikai Wang, Zhen Xiao, Jieyi Long, Ming Li
2025-11-26
Summary
This paper introduces a new system called Yo'City for creating realistic and very large 3D cities. It aims to go beyond what current methods can do by allowing users to customize the city and make it expand endlessly.
What's the problem?
Existing methods for generating 3D cities usually rely on a single computer program learning from examples. This limits how unique the cities can be and how large they can grow. It's hard to create cities that are both personalized to what a user wants and big enough to feel truly expansive.
What's the solution?
Yo'City works in stages, almost like an architect planning a city. First, it creates a high-level plan dividing the city into districts and grids. Then, it uses powerful AI models to generate detailed 3D scenes for each grid, constantly improving them through a process of creating, refining, and evaluating. Finally, it allows users to interactively expand the city in a way that makes sense, ensuring new areas connect logically to what’s already there.
Why it matters?
This research is important because realistic 3D cities are needed for things like virtual reality experiences and creating digital copies of real cities (digital twins). Yo'City’s ability to generate large, customizable cities could significantly improve these applications, making them more immersive and useful. The system also demonstrates a new way to use existing AI models to tackle complex generation tasks.
Abstract
Realistic 3D city generation is fundamental to a wide range of applications, including virtual reality and digital twins. However, most existing methods rely on training a single diffusion model, which limits their ability to generate personalized and boundless city-scale scenes. In this paper, we present Yo'City, a novel agentic framework that enables user-customized and infinitely expandable 3D city generation by leveraging the reasoning and compositional capabilities of off-the-shelf large models. Specifically, Yo'City first conceptualize the city through a top-down planning strategy that defines a hierarchical "City-District-Grid" structure. The Global Planner determines the overall layout and potential functional districts, while the Local Designer further refines each district with detailed grid-level descriptions. Subsequently, the grid-level 3D generation is achieved through a "produce-refine-evaluate" isometric image synthesis loop, followed by image-to-3D generation. To simulate continuous city evolution, Yo'City further introduces a user-interactive, relationship-guided expansion mechanism, which performs scene graph-based distance- and semantics-aware layout optimization, ensuring spatially coherent city growth. To comprehensively evaluate our method, we construct a diverse benchmark dataset and design six multi-dimensional metrics that assess generation quality from the perspectives of semantics, geometry, texture, and layout. Extensive experiments demonstrate that Yo'City consistently outperforms existing state-of-the-art methods across all evaluation aspects.