The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

Xiangbo Gao, Mingyang Wu, Siyuan Yang, Jiongze Yu, Pardis Taghavi, Fangzhou Lin, Zhengzhong Tu

2026-03-26

The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics

Summary

This paper focuses on a problem with how AI creates videos: even though the videos *look* realistic, the timing and speed of actions within the video often don't make physical sense.

What's the problem?

Current AI video generators are trained on videos taken at all sorts of speeds and then forced to output videos at a standard frame rate. Imagine watching a time-lapse and then a regular speed video – the AI doesn't understand the difference in actual speed, just that both are shown at, say, 30 frames per second. This causes 'chronometric hallucination,' meaning the AI creates movements that have inconsistent and unrealistic speeds, making the videos feel unnatural even if they look good visually.

What's the solution?

The researchers developed a system called Visual Chronometer. This system analyzes the *motion* within a video to *predict* the true 'Physical Frames Per Second' (PhyFPS) – essentially, how fast things are actually moving in the real world. It learns this by being trained to understand how different motions correspond to different speeds, ignoring any potentially incorrect timing information that might be attached to the original video. They also created benchmarks to measure how well different AI models handle time.

Why it matters?

This work is important because truly realistic AI-generated videos need to accurately simulate physics, and that includes getting the timing right. Fixing the timing issues makes the videos more believable and opens the door for using these AI models to create more accurate 'world models' – AI systems that can understand and predict how the physical world works.

Abstract

While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.

View Paper