Pandora3D: A Comprehensive Framework for High-Quality 3D Shape and Texture Generation
Jiayu Yang, Taizhang Shang, Weixuan Sun, Xibin Song, Ziang Cheng, Senbo Wang, Shenzhou Chen, Weizhe Liu, Hongdong Li, Pan Ji
2025-02-25
Summary
This paper talks about Pandora3D, a new system that can create high-quality 3D shapes and textures from different types of inputs, like pictures, multiple views of an object, or even text descriptions.
What's the problem?
Making 3D models with realistic shapes and textures is usually a difficult and time-consuming process. It requires advanced skills and expensive tools, which makes it hard for many people to create detailed 3D content for things like video games, movies, or simulations.
What's the solution?
The researchers designed Pandora3D, which uses advanced AI techniques to automate the creation of 3D models. For shapes, it uses a tool called a Variational Autoencoder (VAE) to understand the geometry of objects and a diffusion network to generate shapes based on input prompts. For textures, it follows a step-by-step process to create detailed and consistent textures from different angles, using methods like RGB-to-PBR conversion and high-resolution refinement. A special 'consistency scheduler' ensures that all the textures match up perfectly across different views.
Why it matters?
This matters because it makes creating high-quality 3D content faster, easier, and more accessible. It could help artists, game developers, and even researchers save time and resources while producing realistic 3D models. This technology has the potential to improve industries like entertainment, virtual reality, and robotics by simplifying how 3D assets are made.
Abstract
This report presents a comprehensive framework for generating high-quality 3D shapes and textures from diverse input prompts, including single images, multi-view images, and text descriptions. The framework consists of 3D shape generation and texture generation. (1). The 3D shape generation pipeline employs a Variational Autoencoder (VAE) to encode implicit 3D geometries into a latent space and a diffusion network to generate latents conditioned on input prompts, with modifications to enhance model capacity. An alternative Artist-Created Mesh (AM) generation approach is also explored, yielding promising results for simpler geometries. (2). Texture generation involves a multi-stage process starting with frontal images generation followed by multi-view images generation, RGB-to-PBR texture conversion, and high-resolution multi-view texture refinement. A consistency scheduler is plugged into every stage, to enforce pixel-wise consistency among multi-view textures during inference, ensuring seamless integration. The pipeline demonstrates effective handling of diverse input formats, leveraging advanced neural architectures and novel methodologies to produce high-quality 3D content. This report details the system architecture, experimental results, and potential future directions to improve and expand the framework. The source code and pretrained weights are released at: https://github.com/Tencent/Tencent-XR-3DGen.