What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Minh-Quan Le, Yuanzhi Zhu, Vicky Kalogeiton, Dimitris Samaras

2025-12-02

What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards

Summary

This paper introduces a new method, NewtonRewards, to make videos created by AI more realistic by ensuring they follow the basic laws of physics, like how objects move and interact.

What's the problem?

Current AI models can generate videos that *look* good, but often show things that aren't physically possible – objects might float, speeds might change randomly, or collisions won't make sense. This means there's a difference between a video looking real and actually behaving like things do in the real world.

What's the solution?

The researchers developed NewtonRewards, which doesn't rely on people telling the AI what looks good. Instead, it uses existing AI tools to *measure* how physically realistic the generated video is. It looks at things like how fast objects are moving (using optical flow) and estimates their mass based on how they appear. Then, it gives the video generation AI a 'reward' for following physics rules, specifically for consistent acceleration and conserving mass, encouraging it to create more believable motion.

Why it matters?

This work is important because it provides a way to automatically improve the physical accuracy of AI-generated videos. By using physics-based rewards, the AI can learn to create videos that aren't just visually appealing, but also behave in a way that makes sense in the real world, opening the door for more realistic and useful AI-created content.

Abstract

Recent video diffusion models can synthesize visually compelling clips, yet often violate basic physical laws-objects float, accelerations drift, and collisions behave inconsistently-revealing a persistent gap between visual realism and physical realism. We propose NewtonRewards, the first physics-grounded post-training framework for video generation based on verifiable rewards. Instead of relying on human or VLM feedback, NewtonRewards extracts measurable proxies from generated videos using frozen utility models: optical flow serves as a proxy for velocity, while high-level appearance features serve as a proxy for mass. These proxies enable explicit enforcement of Newtonian structure through two complementary rewards: a Newtonian kinematic constraint enforcing constant-acceleration dynamics, and a mass conservation reward preventing trivial, degenerate solutions. We evaluate NewtonRewards on five Newtonian Motion Primitives (free fall, horizontal/parabolic throw, and ramp sliding down/up) using our newly constructed large-scale benchmark, NewtonBench-60K. Across all primitives in visual and physics metrics, NewtonRewards consistently improves physical plausibility, motion smoothness, and temporal coherence over prior post-training methods. It further maintains strong performance under out-of-distribution shifts in height, speed, and friction. Our results show that physics-grounded verifiable rewards offer a scalable path toward physics-aware video generation.

View Paper