Do generative video models learn physical principles from watching videos?

Saman Motamed, Laura Culp, Kevin Swersky, Priyank Jaini, Robert Geirhos

2025-01-17

Do generative video models learn physical principles from watching videos?

Summary

This paper talks about a new test called Physics-IQ that checks if AI video generators actually understand the physics behind what they're creating, or if they're just really good at making videos look realistic without truly grasping how things work in the real world.

What's the problem?

As AI gets better at making videos that look real, there's a big debate about whether these AI models are actually learning how the world works or if they're just really good at predicting what pixels should come next to make something look realistic. It's hard to tell if the AI understands things like gravity or how fluids move, or if it's just copying what it's seen before.

What's the solution?

The researchers created Physics-IQ, which is like a super-tough physics test for AI. This test includes videos of all sorts of physical phenomena, from how liquids flow to how light reflects off surfaces. They then used this test on several popular AI video generators to see how well they understood these physics concepts. The test is designed so that the AI can't just fake it - it really needs to understand the physics to pass.

Why it matters?

This matters because as AI gets more advanced, we need to know if it's actually understanding the world or just mimicking it. If AI can truly learn physics from watching videos, it could lead to huge breakthroughs in science and technology. But the study shows we're not there yet - even though AI can make videos that look real, it doesn't really understand the physics behind what it's showing. This helps us know where AI is strong and where it needs improvement, which is crucial for developing AI that can truly understand and interact with the physical world.

Abstract

AI video generation is undergoing a revolution, with quality and realism advancing rapidly. These advances have led to a passionate scientific debate: Do video models learn ``world models'' that discover laws of physics -- or, alternatively, are they merely sophisticated pixel predictors that achieve visual realism without understanding the physical principles of reality? We address this question by developing Physics-IQ, a comprehensive benchmark dataset that can only be solved by acquiring a deep understanding of various physical principles, like fluid dynamics, optics, solid mechanics, magnetism and thermodynamics. We find that across a range of current models (Sora, Runway, Pika, Lumiere, Stable Video Diffusion, and VideoPoet), physical understanding is severely limited, and unrelated to visual realism. At the same time, some test cases can already be successfully solved. This indicates that acquiring certain physical principles from observation alone may be possible, but significant challenges remain. While we expect rapid advances ahead, our work demonstrates that visual realism does not imply physical understanding. Our project page is at https://physics-iq.github.io; code at https://github.com/google-deepmind/physics-IQ-benchmark.

View Paper