LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

Jianhao Yuan, Fabio Pizzati, Francesco Pinto, Lars Kunze, Ivan Laptev, Paul Newman, Philip Torr, Daniele De Martini

2025-10-14

LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference

Summary

This paper focuses on how well AI models that create videos understand basic physics, like how objects should move and interact. It introduces a new way to test these models without needing to train anything specifically for the test itself.

What's the problem?

It's really hard to tell if a video generated by AI looks realistic because it understands physics, or just because it *looks* good. Existing methods struggle to separate genuine physics understanding from simply creating visually appealing content. We need a way to accurately measure if an AI truly 'gets' how the physical world works when making videos.

What's the solution?

The researchers created a method called LikePhys that checks if a video generated by AI is physically possible or impossible. It does this by looking at how the AI 'denoises' (cleans up) a video – if the AI thinks an impossible video is more likely, that suggests it doesn't understand physics well. They tested this method on a set of videos designed to test different physics concepts and created a score called Plausibility Preference Error (PPE) to measure performance, finding it aligns well with what humans think is realistic.

Why it matters?

This work is important because building AI that can simulate the real world accurately requires it to understand physics. This new evaluation method allows researchers to better measure and improve the physics understanding of these AI models, paving the way for more realistic and useful simulations and potentially more intelligent AI systems.

Abstract

Intuitive physics understanding in video diffusion models plays an essential role in building general-purpose physically plausible world simulators, yet accurately evaluating such capacity remains a challenging task due to the difficulty in disentangling physics correctness from visual appearance in generation. To the end, we introduce LikePhys, a training-free method that evaluates intuitive physics in video diffusion models by distinguishing physically valid and impossible videos using the denoising objective as an ELBO-based likelihood surrogate on a curated dataset of valid-invalid pairs. By testing on our constructed benchmark of twelve scenarios spanning over four physics domains, we show that our evaluation metric, Plausibility Preference Error (PPE), demonstrates strong alignment with human preference, outperforming state-of-the-art evaluator baselines. We then systematically benchmark intuitive physics understanding in current video diffusion models. Our study further analyses how model design and inference settings affect intuitive physics understanding and highlights domain-specific capacity variations across physical laws. Empirical results show that, despite current models struggling with complex and chaotic dynamics, there is a clear trend of improvement in physics understanding as model capacity and inference settings scale.

View Paper