Seeing the Wind from a Falling Leaf
Zhiyuan Gao, Jiageng Mao, Hong-Xing Yu, Haozhe Lou, Emily Yue-Ting Jia, Jernej Barbic, Jiajun Wu, Yue Wang
2025-12-02
Summary
This paper focuses on figuring out the hidden forces that cause things to move in videos, going beyond just tracking the movement itself.
What's the problem?
Currently, computer vision can *see* motion in videos, but it doesn't understand *why* things are moving. We don't have a good way to determine the underlying physical forces – like wind or gravity – that are causing objects to behave the way they do. It's like watching a leaf fall and knowing it's going down, but not being able to 'see' the wind pushing it.
What's the solution?
The researchers created a new system that uses a technique called 'inverse graphics'. Basically, they built a computer model that can work backwards from the observed motion in a video to figure out what forces must be acting on the objects. This model considers the shape of the objects and their physical properties, and then uses a process similar to learning to adjust those properties until the model's simulated motion matches the video. They used a method called backpropagation, which is common in machine learning, to refine the force estimations.
Why it matters?
This work is important because it helps bridge the gap between what computers 'see' and how the physical world actually works. Understanding these forces opens up possibilities for things like creating more realistic videos, editing videos in a physically accurate way, and even potentially predicting how objects will move in the future. It's a step towards computers having a more complete understanding of the world around them.
Abstract
A longstanding goal in computer vision is to model motions from videos, while the representations behind motions, i.e. the invisible physical interactions that cause objects to deform and move, remain largely unexplored. In this paper, we study how to recover the invisible forces from visual observations, e.g., estimating the wind field by observing a leaf falling to the ground. Our key innovation is an end-to-end differentiable inverse graphics framework, which jointly models object geometry, physical properties, and interactions directly from videos. Through backpropagation, our approach enables the recovery of force representations from object motions. We validate our method on both synthetic and real-world scenarios, and the results demonstrate its ability to infer plausible force fields from videos. Furthermore, we show the potential applications of our approach, including physics-based video generation and editing. We hope our approach sheds light on understanding and modeling the physical process behind pixels, bridging the gap between vision and physics. Please check more video results in our https://chaoren2357.github.io/seeingthewind/{project page}.