TEXGen: a Generative Diffusion Model for Mesh Textures

Xin Yu, Ze Yuan, Yuan-Chen Guo, Ying-Tian Liu, JianHui Liu, Yangguang Li, Yan-Pei Cao, Ding Liang, Xiaojuan Qi

2024-11-26

TEXGen: a Generative Diffusion Model for Mesh Textures

Summary

This paper introduces TEXGen, a new model designed to generate high-quality textures for 3D objects directly from images and text descriptions, improving the way textures are created for realistic 3D rendering.

What's the problem?

Creating realistic textures for 3D models is challenging because existing methods often rely on pre-trained models that work with 2D images. These methods can be limited and may not effectively capture the details needed for high-resolution textures. Additionally, there hasn't been much research focused on learning directly in the texture space, which is crucial for generating detailed textures.

What's the solution?

TEXGen addresses this issue by training a large diffusion model specifically to generate UV texture maps, which represent how a texture is applied to a 3D surface. The model uses a unique architecture that combines convolutional layers with attention mechanisms to learn efficiently in high-resolution UV spaces. It can generate textures based on text prompts and single-view images, allowing for versatile applications like texture inpainting and synthesis.

Why it matters?

This research is important because it enhances the process of creating realistic textures for 3D models, which is essential in fields like gaming, animation, and virtual reality. By developing a model that can generate textures directly from descriptions and images, TEXGen opens up new possibilities for content creators and improves the quality of visual experiences.

Abstract

While high-quality texture maps are essential for realistic 3D asset rendering, few studies have explored learning directly in the texture space, especially on large-scale datasets. In this work, we depart from the conventional approach of relying on pre-trained 2D diffusion models for test-time optimization of 3D textures. Instead, we focus on the fundamental problem of learning in the UV texture space itself. For the first time, we train a large diffusion model capable of directly generating high-resolution texture maps in a feed-forward manner. To facilitate efficient learning in high-resolution UV spaces, we propose a scalable network architecture that interleaves convolutions on UV maps with attention layers on point clouds. Leveraging this architectural design, we train a 700 million parameter diffusion model that can generate UV texture maps guided by text prompts and single-view images. Once trained, our model naturally supports various extended applications, including text-guided texture inpainting, sparse-view texture completion, and text-driven texture synthesis. Project page is at http://cvmi-lab.github.io/TEXGen/.

View Paper