Particulate: Feed-Forward 3D Object Articulation
Ruining Li, Yuxin Yao, Chuanxia Zheng, Christian Rupprecht, Joan Lasenby, Shangzhe Wu, Andrea Vedaldi
2025-12-15
Summary
This paper introduces a new system called Particulate that can automatically figure out how a 3D model of an object, like a chair or a robot, is put together and how its parts move.
What's the problem?
Traditionally, if you have a 3D model of something with moving parts, figuring out *how* those parts connect and move is really difficult and time-consuming. It usually requires a lot of manual work or complex computer programs that need to be adjusted for each individual object. Existing methods are slow and don't work well with newer 3D models created by AI.
What's the solution?
Particulate uses a type of artificial intelligence called a transformer network to analyze the 3D shape of an object and directly predict its parts, how they're connected, and how they're allowed to move. It's trained on a large collection of 3D objects, so it learns to recognize common patterns. The key is that it does this all in one step, without needing to fine-tune the system for each new object, making it much faster. It can even work with 3D models generated by AI.
Why it matters?
This is important because it makes it much easier to work with 3D models of complex objects. This could be useful for things like robotics, animation, video games, and even designing new products. The speed and accuracy of Particulate open up possibilities for automatically creating usable 3D models from images or AI-generated content, which was previously a major challenge.
Abstract
We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including its 3D parts, kinematic structure, and motion constraints. At its core is a transformer network, Part Articulation Transformer, which processes a point cloud of the input mesh using a flexible and scalable architecture to predict all the aforementioned attributes with native multi-joint support. We train the network end-to-end on a diverse collection of articulated 3D assets from public datasets. During inference, Particulate lifts the network's feed-forward prediction to the input mesh, yielding a fully articulated 3D model in seconds, much faster than prior approaches that require per-object optimization. Particulate can also accurately infer the articulated structure of AI-generated 3D assets, enabling full-fledged extraction of articulated 3D objects from a single (real or synthetic) image when combined with an off-the-shelf image-to-3D generator. We further introduce a new challenging benchmark for 3D articulation estimation curated from high-quality public 3D assets, and redesign the evaluation protocol to be more consistent with human preferences. Quantitative and qualitative results show that Particulate significantly outperforms state-of-the-art approaches.