MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

Zilong Huang, Jun He, Xiaobin Huang, Ziyi Xiong, Yang Luo, Junyan Ye, Weijia Li, Yiping Chen, Ting Han

2025-11-26

MajutsuCity: Language-driven Aesthetic-adaptive City Generation with Controllable 3D Assets and Layouts

Summary

This paper introduces MajutsuCity, a new system for creating realistic and customizable 3D cities using just text descriptions. It aims to make generating detailed urban environments easier and more flexible for things like video games, virtual reality, and building virtual worlds.

What's the problem?

Currently, creating 3D cities is tricky because existing methods either let you easily describe what you want with text but don't allow for precise editing, or they let you edit things directly but are hard to use creatively. It's difficult to get both stylistic variety *and* the ability to make specific changes to individual buildings or objects within the city. There also wasn't a good dataset available with the necessary information to train these kinds of systems effectively.

What's the solution?

The researchers developed MajutsuCity, which works in four steps to build a city. It represents a city as a combination of layouts, buildings, and materials that can all be controlled. They also created MajutsuAgent, a tool that lets you edit the city after it's generated using natural language commands – like 'move that building' or 'change the color of the roof'. To support this, they built a new dataset called MajutsuDataset, filled with detailed information about city layouts, 3D building models, and realistic materials. Finally, they created ways to measure how good the generated cities are, looking at things like how realistic they look and how well the city's structure makes sense.

Why it matters?

MajutsuCity represents a significant improvement in 3D city generation. It outperforms previous methods in creating cities that are both visually appealing, structurally sound, and easy to modify. This is important because it opens up possibilities for more immersive and interactive virtual environments, and the dataset and code being released will help other researchers build on this work.

Abstract

Generating realistic 3D cities is fundamental to world models, virtual reality, and game development, where an ideal urban scene must satisfy both stylistic diversity, fine-grained, and controllability. However, existing methods struggle to balance the creative flexibility offered by text-based generation with the object-level editability enabled by explicit structural representations. We introduce MajutsuCity, a natural language-driven and aesthetically adaptive framework for synthesizing structurally consistent and stylistically diverse 3D urban scenes. MajutsuCity represents a city as a composition of controllable layouts, assets, and materials, and operates through a four-stage pipeline. To extend controllability beyond initial generation, we further integrate MajutsuAgent, an interactive language-grounded editing agent} that supports five object-level operations. To support photorealistic and customizable scene synthesis, we also construct MajutsuDataset, a high-quality multimodal dataset} containing 2D semantic layouts and height maps, diverse 3D building assets, and curated PBR materials and skyboxes, each accompanied by detailed annotations. Meanwhile, we develop a practical set of evaluation metrics, covering key dimensions such as structural consistency, scene complexity, material fidelity, and lighting atmosphere. Extensive experiments demonstrate MajutsuCity reduces layout FID by 83.7% compared with CityDreamer and by 20.1% over CityCraft. Our method ranks first across all AQS and RDR scores, outperforming existing methods by a clear margin. These results confirm MajutsuCity as a new state-of-the-art in geometric fidelity, stylistic adaptability, and semantic controllability for 3D city generation. We expect our framework can inspire new avenues of research in 3D city generation. Our dataset and code will be released at https://github.com/LongHZ140516/MajutsuCity.

View Paper