The first stage of OmniPart involves an autoregressive structure planning module that generates a controllable, variable-length sequence of 3D part bounding boxes. This module is critically guided by flexible 2D part masks that allow for intuitive control over part decomposition without requiring direct correspondences or semantic labels. The second stage involves a spatially-conditioned rectified flow model that synthesizes all 3D parts simultaneously and consistently within the planned layout. This approach supports user-defined part granularity, precise localization, and enables diverse downstream applications.
OmniPart generates high-quality part-aware 3D content directly from a single input image and naturally supports a range of downstream applications, including animation, mask-controlled generation, multi-granularity generation, material editing, and geometry processing. The framework is built upon TRELLIS, which provides a spatially structured sparse voxel latent space, and leverages a large-scale shape model pretrained on overall objects. Extensive experiments demonstrate that OmniPart achieves state-of-the-art performance, paving the way for more interpretable, editable, and versatile 3D content.