Repurposing 3D Generative Model for Autoregressive Layout Generation

Haoran Feng, Yifan Niu, Zehuan Huang, Yang-Tian Sun, Chunchao Guo, Yuxin Peng, Lu Sheng

2026-04-20

Repurposing 3D Generative Model for Autoregressive Layout Generation

Summary

This paper introduces LaviGen, a new system for automatically creating 3D arrangements of objects in a scene, like designing a room with furniture.

What's the problem?

Existing methods for creating 3D layouts usually start with a text description, which can be difficult to translate accurately into a realistic 3D scene. They often struggle to create arrangements that make sense physically – things might float in the air or overlap in impossible ways. Plus, these methods can be slow and computationally expensive.

What's the solution?

LaviGen takes a different approach by working directly in the 3D world. It builds the layout step-by-step, considering how objects relate to each other geometrically and making sure the arrangement follows the laws of physics. They improved a type of AI model called a 3D diffusion model to better understand the scene, the objects, and any instructions given, and they used a clever technique to make the process faster and more precise.

Why it matters?

LaviGen is a significant improvement because it creates more realistic and physically plausible 3D layouts than previous methods, and it does so much faster. This is important for applications like virtual reality, game development, and architectural design, where creating believable 3D environments is crucial.

Abstract

We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that infer object layouts from textual descriptions, LaviGen operates directly in the native 3D space, formulating layout generation as an autoregressive process that explicitly models geometric relations and physical constraints among objects, producing coherent and physically plausible 3D scenes. To further enhance this process, we propose an adapted 3D diffusion model that integrates scene, object, and instruction information and employs a dual-guidance self-rollout distillation mechanism to improve efficiency and spatial accuracy. Extensive experiments on the LayoutVLM benchmark show LaviGen achieves superior 3D layout generation performance, with 19% higher physical plausibility than the state of the art and 65% faster computation. Our code is publicly available at https://github.com/fenghora/LaviGen.

View Paper