LATTICE: Democratize High-Fidelity 3D Generation at Scale

Zeqiang Lai, Yunfei Zhao, Zibo Zhao, Haolin Liu, Qingxiang Lin, Jingwei Huang, Chunchao Guo, Xiangyu Yue

2025-12-05

LATTICE: Democratize High-Fidelity 3D Generation at Scale

Summary

This paper introduces LATTICE, a new system for creating detailed 3D models using artificial intelligence, aiming to make the process both high-quality and efficient.

What's the problem?

Creating 3D models with AI is much harder than creating 2D images. 2D images can be built on a simple grid, but 3D models need to define shape *and* surface detail from nothing. Existing methods struggle because they're computationally expensive and lack a good way to organize and compress the information needed to represent 3D objects effectively.

What's the solution?

The researchers developed a new way to represent 3D objects called VoxSet. Think of it like compressing a 3D model into a smaller set of key pieces of information linked to a basic grid. This makes it easier for the AI to understand where things are in the 3D space. They then built LATTICE, which uses this VoxSet representation in two steps: first, it creates a rough, blocky version of the model, and then it adds all the fine details using a special type of AI called a rectified flow transformer.

Why it matters?

LATTICE is a significant step forward because it allows for the creation of high-quality 3D models more easily and quickly than previous methods. It’s simpler to train, can handle different levels of detail, and ultimately brings us closer to being able to generate complex 3D assets on a large scale, which is important for things like video games, movies, and design.

Abstract

We present LATTICE, a new framework for high-fidelity 3D asset generation that bridges the quality and scalability gap between 3D and 2D generative models. While 2D image synthesis benefits from fixed spatial grids and well-established transformer architectures, 3D generation remains fundamentally more challenging due to the need to predict both spatial structure and detailed geometric surfaces from scratch. These challenges are exacerbated by the computational complexity of existing 3D representations and the lack of structured and scalable 3D asset encoding schemes. To address this, we propose VoxSet, a semi-structured representation that compresses 3D assets into a compact set of latent vectors anchored to a coarse voxel grid, enabling efficient and position-aware generation. VoxSet retains the simplicity and compression advantages of prior VecSet methods while introducing explicit structure into the latent space, allowing positional embeddings to guide generation and enabling strong token-level test-time scaling. Built upon this representation, LATTICE adopts a two-stage pipeline: first generating a sparse voxelized geometry anchor, then producing detailed geometry using a rectified flow transformer. Our method is simple at its core, but supports arbitrary resolution decoding, low-cost training, and flexible inference schemes, achieving state-of-the-art performance on various aspects, and offering a significant step toward scalable, high-quality 3D asset creation.

View Paper