Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects

Jiawei Wang, Dingyou Wang, Jiaming Hu, Qixuan Zhang, Jingyi Yu, Lan Xu

2025-11-06

Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects

Summary

This paper introduces a new system called Kinematify that automatically creates digital models of objects that can move, like robotic arms or even toys, just from a picture or a text description.

What's the problem?

Currently, building these kinds of models, especially for complex objects with many moving parts, is really hard and time-consuming. Existing methods need a lot of example motions or require someone to manually define how the object is put together, which doesn't work well when you want to model lots of different things quickly.

What's the solution?

Kinematify tackles this by using a two-step process. First, it figures out the basic structure – how the different parts connect and move – using a smart search method. Then, it precisely measures the sizes and positions of those parts to make sure the model works realistically. It combines these steps to create a functional and accurate digital representation of the object.

Why it matters?

This is important because having a way to automatically create these models makes it easier to simulate how robots interact with the world, plan their movements, and even learn new skills. It removes a major bottleneck in robotics and computer graphics, allowing for more complex and realistic simulations and applications.

Abstract

A deep understanding of kinematic structures and movable components is essential for enabling robots to manipulate objects and model their own articulated forms. Such understanding is captured through articulated objects, which are essential for tasks such as physical simulation, motion planning, and policy learning. However, creating these models, particularly for objects with high degrees of freedom (DoF), remains a significant challenge. Existing methods typically rely on motion sequences or strong assumptions from hand-curated datasets, which hinders scalability. In this paper, we introduce Kinematify, an automated framework that synthesizes articulated objects directly from arbitrary RGB images or textual descriptions. Our method addresses two core challenges: (i) inferring kinematic topologies for high-DoF objects and (ii) estimating joint parameters from static geometry. To achieve this, we combine MCTS search for structural inference with geometry-driven optimization for joint reasoning, producing physically consistent and functionally valid descriptions. We evaluate Kinematify on diverse inputs from both synthetic and real-world environments, demonstrating improvements in registration and kinematic topology accuracy over prior work.

View Paper