The core of Ultra3D is Part Attention, a geometry-aware localized attention mechanism that restricts attention computation within semantically consistent part regions. This design preserves structural continuity while avoiding unnecessary global attention, achieving up to 6.7× speed-up in latent generation. To support this mechanism, Ultra3D constructs a scalable part annotation pipeline that converts raw meshes into part-labeled sparse voxels.
Ultra3D is a two-stage framework that first generates sparse voxel layout via VecSet and then refines it by generating per-voxel latent. When the input condition is an image, each part group performs cross attention only with the image tokens onto which its voxel tokens are projected. This approach enables the generation of high-quality 3D meshes with fine-grained geometric details, making it suitable for various applications such as computer-aided design, video games, and virtual reality.