Generalizable Implicit Motion Modeling for Video Frame Interpolation

Zujin Guo, Wei Li, Chen Change Loy

2024-07-13

Generalizable Implicit Motion Modeling for Video Frame Interpolation

Summary

This paper introduces Generalizable Implicit Motion Modeling (GIMM), a new method for improving video frame interpolation (VFI). VFI is the process of creating intermediate frames between two existing frames in a video to make it look smoother and more fluid.

What's the problem?

Current methods for VFI often struggle because they either use simple combinations of motion data or predict motion flows without fully understanding the complex movements that happen in real videos. This can lead to poor quality in the interpolated frames, making videos look choppy or unnatural.

What's the solution?

GIMM addresses these issues by using a more advanced approach to model motion. It creates a motion encoding pipeline that captures the specific movements happening in the video. Instead of just looking at the frames themselves, GIMM uses information about how things move over time (spatiotemporal dynamics) to predict how to fill in the gaps between frames. It does this with an adaptive neural network that can handle different time intervals, allowing it to predict motion flows more accurately. The best part is that GIMM can easily work with existing VFI methods without needing major changes.

Why it matters?

This research is important because it significantly improves the quality of video frame interpolation, making videos smoother and more visually appealing. By enhancing how we model motion in videos, GIMM can be applied in various fields like film production, gaming, and any area where high-quality video playback is essential.

Abstract

Motion modeling is critical in flow-based Video Frame Interpolation (VFI). Existing paradigms either consider linear combinations of bidirectional flows or directly predict bilateral flows for given timestamps without exploring favorable motion priors, thus lacking the capability of effectively modeling spatiotemporal dynamics in real-world videos. To address this limitation, in this study, we introduce Generalizable Implicit Motion Modeling (GIMM), a novel and effective approach to motion modeling for VFI. Specifically, to enable GIMM as an effective motion modeling paradigm, we design a motion encoding pipeline to model spatiotemporal motion latent from bidirectional flows extracted from pre-trained flow estimators, effectively representing input-specific motion priors. Then, we implicitly predict arbitrary-timestep optical flows within two adjacent input frames via an adaptive coordinate-based neural network, with spatiotemporal coordinates and motion latent as inputs. Our GIMM can be smoothly integrated with existing flow-based VFI works without further modifications. We show that GIMM performs better than the current state of the art on the VFI benchmarks.

View Paper