CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

Hwan Heo, Jangyeong Kim, Seongyeong Lee, Jeong A Wi, Junyoung Choi, Sangjun Ahn

2025-01-17

CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation

Summary

This paper talks about CaPa, a new way to create high-quality 3D models quickly and efficiently. It's like having a super-fast digital sculptor and painter working together to make realistic 3D objects from descriptions or images.

What's the problem?

Creating good 3D models is really hard for computers. Current methods often make models that look different from different angles, take a long time to make, don't look very realistic, or have problems with their surfaces. It's like trying to build a detailed sandcastle quickly - it's tough to get all the parts right.

What's the solution?

The researchers created CaPa, which works in two steps. First, it builds the basic shape of the object using a special AI that looks at the object from different angles to make sure it looks right from all sides. Then, another AI paints high-quality textures onto this shape, using a clever method to make sure everything looks good together. They also added a feature that fills in any spots that didn't get painted properly. The whole process takes less than 30 seconds, which is super fast for making 3D models.

Why it matters?

This matters because it could change how 3D models are made for things like video games, movies, and virtual reality. Instead of artists spending hours or days making detailed 3D models, they could use CaPa to create high-quality models in seconds. This could make it much faster and cheaper to create 3D content, potentially leading to more detailed and realistic virtual worlds in games and movies, or even helping designers quickly prototype new product designs.

Abstract

The synthesis of high-quality 3D assets from textual or visual inputs has become a central objective in modern generative modeling. Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems. While some studies have addressed some of these issues, a comprehensive solution remains elusive. In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently. CaPa employs a two-stage process, decoupling geometry generation from texture synthesis. Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives. Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry. Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model. This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications. Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.

View Paper