Video Analysis and Generation via a Semantic Progress Function
Gal Metzer, Sagi Polaczek, Ali Mahdavi-Amiri, Raja Giryes, Daniel Cohen-Or
2026-04-27
Summary
This paper deals with the issue of how images and videos generated by AI often change in a jerky, unpredictable way, rather than smoothly evolving. They introduce a way to measure and fix this problem, making AI-generated content more natural and consistent.
What's the problem?
When AI creates videos or a series of images, the content doesn't always change gradually. Sometimes it stays the same for a while, then suddenly jumps to something completely different. This creates a choppy, unrealistic effect. The core issue is that the *meaning* of the content isn't progressing at a steady pace, leading to these abrupt shifts.
What's the solution?
The researchers developed something called a 'Semantic Progress Function'. Think of it as a tool that tracks how the meaning of each frame in a video changes compared to the frames around it. They use this function to create a curve showing the overall semantic shift. If the curve is straight, the meaning is changing consistently. If it's bumpy, there are uneven jumps. They then developed a method to 'linearize' the video, essentially re-timing it so the meaning changes at a constant rate, resulting in smoother transitions.
Why it matters?
This work is important because it provides a way to control and improve the quality of AI-generated content. It's not limited to one specific AI model, meaning it can be applied broadly. It also opens up possibilities for comparing different AI generators and even manipulating real-world videos to change their pacing, potentially for creative or analytical purposes.
Abstract
Transformations produced by image and video generation models often evolve in a highly non-linear manner: long stretches where the content barely changes are followed by sudden, abrupt semantic jumps. To analyze and correct this behavior, we introduce a Semantic Progress Function, a one-dimensional representation that captures how the meaning of a given sequence evolves over time. For each frame, we compute distances between semantic embeddings and fit a smooth curve that reflects the cumulative semantic shift across the sequence. Departures of this curve from a straight line reveal uneven semantic pacing. Building on this insight, we propose a semantic linearization procedure that reparameterizes (or retimes) the sequence so that semantic change unfolds at a constant rate, yielding smoother and more coherent transitions. Beyond linearization, our framework provides a model-agnostic foundation for identifying temporal irregularities, comparing semantic pacing across different generators, and steering both generated and real-world video sequences toward arbitrary target pacing.