Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Long Le, Ryan Lucas, Chen Wang, Chuhao Chen, Dinesh Jayaraman, Eric Eaton, Lingjie Liu

2025-08-27

Pixie: Fast and Generalizable Supervised Learning of 3D Physics from Pixels

Summary

This paper introduces a new method, PIXIE, for figuring out how 'real' objects in 3D scenes should behave physically – things like how bouncy or squishy they are. It aims to make virtual worlds more interactive and realistic.

What's the problem?

Currently, making virtual objects behave realistically is hard. Existing methods usually require a lot of computing power to adjust the physical properties of each scene individually, which is slow and doesn't work well when you want to apply those properties to new, different scenes. Humans can instantly understand if something is hard or soft just by looking, but computers struggle with this generalization.

What's the solution?

The researchers created PIXIE, a neural network that learns to predict these physical properties directly from the 3D visual information of an object. They trained it using a large dataset of 3D objects with known physical characteristics. Once trained, PIXIE can quickly estimate how an object should behave when forces are applied, and it works well even with scenes it hasn't seen before. They also created a new dataset, PIXIEVERSE, to help with this research.

Why it matters?

This work is important because it makes creating realistic physics simulations much faster and more efficient. Instead of calculating everything from scratch for each scene, PIXIE provides a quick and accurate estimate of material properties. This allows for more interactive and believable virtual environments, and it even works on real-world scenes despite being trained on computer-generated data.

Abstract

Inferring the physical properties of 3D scenes from visual information is a critical yet challenging task for creating interactive and realistic virtual worlds. While humans intuitively grasp material characteristics such as elasticity or stiffness, existing methods often rely on slow, per-scene optimization, limiting their generalizability and application. To address this problem, we introduce PIXIE, a novel method that trains a generalizable neural network to predict physical properties across multiple scenes from 3D visual features purely using supervised losses. Once trained, our feed-forward network can perform fast inference of plausible material fields, which coupled with a learned static scene representation like Gaussian Splatting enables realistic physics simulation under external forces. To facilitate this research, we also collected PIXIEVERSE, one of the largest known datasets of paired 3D assets and physic material annotations. Extensive evaluations demonstrate that PIXIE is about 1.46-4.39x better and orders of magnitude faster than test-time optimization methods. By leveraging pretrained visual features like CLIP, our method can also zero-shot generalize to real-world scenes despite only ever been trained on synthetic data. https://pixie-3d.github.io/

View Paper