Mimicking the Physicist's Eye:A VLM-centric Approach for Physics Formula Discovery
Jiaqi Liu, Songning Lai, Pengze Li, Di Yu, Wenjie Zhou, Yiyang Zhou, Peng Xia, Zijun Wang, Xi Chen, Shixiang Tang, Lei Bai, Wanli Ouyang, Mingyu Ding, Huaxiu Yao, Aoran Wang
2025-09-01
Summary
This paper introduces VIPER-R1, a new AI model designed to automatically figure out the underlying physics equations that govern how things move, just like a scientist would. It's a big step towards AI that can truly understand the physical world around us.
What's the problem?
Current AI methods for discovering physical laws are limited because they usually only look at numbers or simple data. They miss out on the important visual information that humans – and physicists – use to understand motion. Imagine trying to understand a bouncing ball without actually *seeing* it bounce; it's much harder! This lack of visual understanding makes it difficult for AI to accurately identify patterns and create correct equations.
What's the solution?
The researchers created VIPER-R1, which combines vision, data about movement paths, and the ability to reason with symbols (like mathematical equations). It learns in stages, first getting a feel for how things move visually, then forming initial guesses for equations, and finally refining those guesses using a process similar to how scientists check their work. A key part is using an external tool to fine-tune the equations based on real-world observations, making sure they accurately describe what's happening. They also created a new dataset called PhysSymbol with lots of examples to train and test the model.
Why it matters?
This work is important because it allows AI to understand the physical world in a more complete way, moving beyond just analyzing numbers. This could lead to breakthroughs in areas like robotics, where robots need to understand how things move to interact with their environment, and in scientific discovery, where AI could help us uncover new physical laws.
Abstract
Automated discovery of physical laws from observational data in the real world is a grand challenge in AI. Current methods, relying on symbolic regression or LLMs, are limited to uni-modal data and overlook the rich, visual phenomenological representations of motion that are indispensable to physicists. This "sensory deprivation" severely weakens their ability to interpret the inherent spatio-temporal patterns within dynamic phenomena. To address this gap, we propose VIPER-R1, a multimodal model that performs Visual Induction for Physics-based Equation Reasoning to discover fundamental symbolic formulas. It integrates visual perception, trajectory data, and symbolic reasoning to emulate the scientific discovery process. The model is trained via a curriculum of Motion Structure Induction (MSI), using supervised fine-tuning to interpret kinematic phase portraits and to construct hypotheses guided by a Causal Chain of Thought (C-CoT), followed by Reward-Guided Symbolic Calibration (RGSC) to refine the formula structure with reinforcement learning. During inference, the trained VIPER-R1 acts as an agent: it first posits a high-confidence symbolic ansatz, then proactively invokes an external symbolic regression tool to perform Symbolic Residual Realignment (SR^2). This final step, analogous to a physicist's perturbation analysis, reconciles the theoretical model with empirical data. To support this research, we introduce PhysSymbol, a new 5,000-instance multimodal corpus. Experiments show that VIPER-R1 consistently outperforms state-of-the-art VLM baselines in accuracy and interpretability, enabling more precise discovery of physical laws. Project page: https://jiaaqiliu.github.io/VIPER-R1/