PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

Yu Yang, Zhilu Zhang, Xiang Zhang, Yihan Zeng, Hui Li, Wangmeng Zuo

2025-10-27

PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis

Summary

This paper introduces a new method, called PhysWorld, for creating realistic simulations of how things move and interact, particularly focusing on objects that can bend and change shape like cloth or rubber.

What's the problem?

It's really hard to teach computers to understand physics just by showing them videos of the real world, especially when dealing with objects that aren't rigid. These 'deformable' objects are complex, and we usually don't have enough video data to capture all their possible behaviors. Existing methods are often slow and don't generalize well to new situations.

What's the solution?

PhysWorld tackles this by first building a very accurate 'digital twin' of the object inside a physics simulator. They carefully choose the right materials and fine-tune the object's properties to match real-world physics. Then, they virtually 'poke' and 'prod' this digital twin in many different ways, creating a huge amount of training data. Finally, they use this data to train a fast and efficient computer model that can predict how the object will move in the future, and can even be adjusted using real-world video to improve its accuracy.

Why it matters?

This work is important because it allows for more realistic and efficient simulations for things like robotics, virtual reality, and augmented reality. Being able to accurately predict how deformable objects will behave is crucial for creating robots that can manipulate them, and for making VR/AR experiences feel more natural and immersive. Plus, PhysWorld is much faster than previous methods, making it practical for real-time applications.

Abstract

Interactive world models that simulate object dynamics are crucial for robotics, VR, and AR. However, it remains a significant challenge to learn physics-consistent dynamics models from limited real-world video data, especially for deformable objects with spatially-varying physical properties. To overcome the challenge of data scarcity, we propose PhysWorld, a novel framework that utilizes a simulator to synthesize physically plausible and diverse demonstrations to learn efficient world models. Specifically, we first construct a physics-consistent digital twin within MPM simulator via constitutive model selection and global-to-local optimization of physical properties. Subsequently, we apply part-aware perturbations to the physical properties and generate various motion patterns for the digital twin, synthesizing extensive and diverse demonstrations. Finally, using these demonstrations, we train a lightweight GNN-based world model that is embedded with physical properties. The real video can be used to further refine the physical properties. PhysWorld achieves accurate and fast future predictions for various deformable objects, and also generalizes well to novel interactions. Experiments show that PhysWorld has competitive performance while enabling inference speeds 47 times faster than the recent state-of-the-art method, i.e., PhysTwin.

View Paper