PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Ziang Cao, Fangzhou Hong, Zhaoxi Chen, Liang Pan, Ziwei Liu

2025-11-18

PhysX-Anything: Simulation-Ready Physical 3D Assets from Single Image

Summary

This paper introduces PhysX-Anything, a new system that creates 3D models from a single image that are specifically designed to work well in physics simulations, like those used in robotics and game development.

What's the problem?

Currently, many methods for creating 3D models focus on how things *look* but don't accurately represent how they behave physically – things like weight, how parts move, and how they interact with other objects. This makes them difficult to use in situations where realistic physics are important, such as training robots to manipulate objects or creating realistic simulations.

What's the solution?

The researchers developed a new system that uses a type of artificial intelligence called a VLM (Vision-Language Model) to generate 3D models directly from images. They also created a more efficient way to represent the 3D geometry using fewer 'building blocks' (tokens), allowing the AI to learn more details. To help train the AI, they also built a large dataset of 3D objects with detailed physical properties. Essentially, they made a system that can 'understand' an image and create a 3D model that behaves realistically in a simulation.

Why it matters?

This work is important because it allows for the creation of more realistic and useful 3D models for a variety of applications, especially in the field of embodied AI – where robots learn to interact with the real world. By creating models that are 'simulation-ready', it becomes easier to train robots in a virtual environment before deploying them in the real world, saving time and resources.

Abstract

3D modeling is shifting from static visual representations toward physical, articulated assets that can be directly used in simulation and interaction. However, most existing 3D generation methods overlook key physical and articulation properties, thereby limiting their utility in embodied AI. To bridge this gap, we introduce PhysX-Anything, the first simulation-ready physical 3D generative framework that, given a single in-the-wild image, produces high-quality sim-ready 3D assets with explicit geometry, articulation, and physical attributes. Specifically, we propose the first VLM-based physical 3D generative model, along with a new 3D representation that efficiently tokenizes geometry. It reduces the number of tokens by 193x, enabling explicit geometry learning within standard VLM token budgets without introducing any special tokens during fine-tuning and significantly improving generative quality. In addition, to overcome the limited diversity of existing physical 3D datasets, we construct a new dataset, PhysX-Mobility, which expands the object categories in prior physical 3D datasets by over 2x and includes more than 2K common real-world objects with rich physical annotations. Extensive experiments on PhysX-Mobility and in-the-wild images demonstrate that PhysX-Anything delivers strong generative performance and robust generalization. Furthermore, simulation-based experiments in a MuJoCo-style environment validate that our sim-ready assets can be directly used for contact-rich robotic policy learning. We believe PhysX-Anything can substantially empower a broad range of downstream applications, especially in embodied AI and physics-based simulation.

View Paper