LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Desai Xie, Sai Bi, Zhixin Shu, Kai Zhang, Zexiang Xu, Yi Zhou, Sören Pirk, Arie Kaufman, Xin Sun, Hao Tan

2024-06-14

LRM-Zero: Training Large Reconstruction Models with Synthesized Data

Summary

This paper introduces LRM-Zero, a new type of Large Reconstruction Model (LRM) that is trained entirely on computer-generated 3D data. It focuses on creating high-quality 3D images from limited views, using a unique dataset called Zeroverse.

What's the problem?

Most current methods for training 3D models rely on real-world data, which can be difficult to gather and may not always represent the variety of shapes and textures found in nature. This makes it challenging to create models that can accurately reconstruct 3D objects from just a few images. Additionally, existing datasets often focus on realistic appearances, which can limit the model's ability to learn from more abstract or complex shapes.

What's the solution?

To solve this problem, the authors developed LRM-Zero, which uses a dataset called Zeroverse that is created entirely through procedural generation. This means that the shapes and textures are made using simple geometric forms and random patterns, allowing for a wide variety of complex details without relying on real-world examples. The authors showed that LRM-Zero can produce high-quality reconstructions of real objects, performing as well as models trained on traditional datasets like Objaverse.

Why it matters?

This research is important because it demonstrates that high-quality 3D reconstruction can be achieved without needing realistic data. By using synthesized data, researchers can create larger and more diverse datasets more easily, which could lead to better training for AI models in various applications like gaming, virtual reality, and design. The findings encourage further exploration of synthetic data in the field of 3D vision.

Abstract

We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.

View Paper