Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Jiantao Lin, Xin Yang, Meixi Chen, Yingjie Xu, Dongyu Yan, Leyi Wu, Xinli Xu, Lie XU, Shunsi Zhang, Ying-Cong Chen

2025-03-04

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Summary

This paper talks about Kiss3DGen, a new system that uses AI models originally designed for creating 2D images to generate high-quality 3D objects. It simplifies the process of making and editing 3D models by reusing knowledge from these 2D models.

What's the problem?

Creating realistic and detailed 3D models is hard because existing methods need a lot of training data, which is difficult and expensive to collect. Current systems also struggle to generalize well, meaning they can't easily create new types of 3D objects outside their training data.

What's the solution?

The researchers developed Kiss3DGen, which transforms the problem of generating 3D models into a simpler task of generating multi-view 2D images called '3D Bundle Images.' These images include different angles and surface details of an object, which are then used to build a complete 3D model. Kiss3DGen also works with other AI tools to edit and enhance these models, making them more detailed and versatile.

Why it matters?

This matters because it makes creating and editing 3D models faster, easier, and more accessible. By repurposing existing AI technology, Kiss3DGen reduces the need for large amounts of training data while still producing high-quality results. This could be useful for industries like gaming, animation, or virtual reality, where realistic 3D models are essential.

Abstract

Diffusion models have achieved great success in generating 2D images. However, the quality and generalizability of 3D content generation remain limited. State-of-the-art methods often require large-scale 3D assets for training, which are challenging to collect. In this work, we introduce Kiss3DGen (Keep It Simple and Straightforward in 3D Generation), an efficient framework for generating, editing, and enhancing 3D objects by repurposing a well-trained 2D image diffusion model for 3D generation. Specifically, we fine-tune a diffusion model to generate ''3D Bundle Image'', a tiled representation composed of multi-view images and their corresponding normal maps. The normal maps are then used to reconstruct a 3D mesh, and the multi-view images provide texture mapping, resulting in a complete 3D model. This simple method effectively transforms the 3D generation problem into a 2D image generation task, maximizing the utilization of knowledge in pretrained diffusion models. Furthermore, we demonstrate that our Kiss3DGen model is compatible with various diffusion model techniques, enabling advanced features such as 3D editing, mesh and texture enhancement, etc. Through extensive experiments, we demonstrate the effectiveness of our approach, showcasing its ability to produce high-quality 3D models efficiently.

View Paper