TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang, Yan-Pei Cao

2025-02-14

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified
Flow Models

Summary

This paper talks about TripoSG, a new AI system that can create highly detailed 3D models from 2D images. It's like teaching a computer to sculpt realistic 3D objects just by looking at pictures.

What's the problem?

While AI has gotten really good at making 2D images and videos, it's still not great at creating 3D shapes. This is because there's not enough 3D data to learn from, it's hard to process 3D information, and we haven't figured out how to use advanced AI techniques for 3D stuff yet. Current methods for making 3D shapes often produce low-quality results that don't match the input images very well.

What's the solution?

The researchers created TripoSG, which does three main things: First, it uses a special AI setup called a 'rectified flow transformer' that's trained on a huge amount of high-quality 3D data. Second, it combines different ways of looking at 3D shapes (like distances, angles, and smoothness) to make the AI understand 3D objects better. Third, they made a system to create 2 million high-quality 3D samples for the AI to learn from, which is like giving it a massive 3D art class.

Why it matters?

This matters because it could revolutionize how we create and use 3D models in many fields. Imagine being able to turn any picture into a detailed 3D model for video games, movies, or virtual reality with just a click. It could make creating 3D content much faster and easier, which could lead to more realistic video games, better special effects in movies, and new ways to design products or plan buildings. By making their model public, the researchers are also helping other scientists improve 3D AI technology even further.

Abstract

Recent advancements in diffusion techniques have propelled image and video generation to unprece- dented levels of quality, significantly accelerating the deployment and application of generative AI. However, 3D shape generation technology has so far lagged behind, constrained by limitations in 3D data scale, complexity of 3D data process- ing, and insufficient exploration of advanced tech- niques in the 3D domain. Current approaches to 3D shape generation face substantial challenges in terms of output quality, generalization capa- bility, and alignment with input conditions. We present TripoSG, a new streamlined shape diffu- sion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images. Specifically, we propose: 1) A large-scale rectified flow transformer for 3D shape generation, achieving state-of-the-art fidelity through training on extensive, high-quality data. 2) A hybrid supervised training strategy combining SDF, normal, and eikonal losses for 3D VAE, achieving high- quality 3D reconstruction performance. 3) A data processing pipeline to generate 2 million high- quality 3D samples, highlighting the crucial rules for data quality and quantity in training 3D gen- erative models. Through comprehensive experi- ments, we have validated the effectiveness of each component in our new framework. The seamless integration of these parts has enabled TripoSG to achieve state-of-the-art performance in 3D shape generation. The resulting 3D shapes exhibit en- hanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input im- ages. Moreover, TripoSG demonstrates improved versatility in generating 3D models from diverse image styles and contents, showcasing strong gen- eralization capabilities. To foster progress and innovation in the field of 3D generation, we will make our model publicly available.

View Paper