Audio Match Cutting: Finding and Creating Matching Audio Transitions in Movies and Videos
Dennis Fedorishin, Lie Lu, Srirangaraj Setlur, Venu Govindaraju
2024-08-21

Summary
This paper explores a technique called audio match cutting, which focuses on automatically finding and creating smooth audio transitions between different shots in videos and movies.
What's the problem?
In video editing, it's important to create seamless transitions not just visually, but also with sound. However, matching audio transitions can be challenging because they require careful blending of different sounds so that they flow together naturally. Existing methods for doing this can be manual and time-consuming.
What's the solution?
The authors developed a system that automatically identifies and creates these audio match cuts. They created a self-supervised learning model to help find matching audio segments and designed a pipeline that blends the audio smoothly. They also built a dataset specifically for this task to test their method against various audio representations to find the best matches.
Why it matters?
This research is significant because it automates a complex part of video editing, making it easier and faster to produce high-quality videos with smooth audio transitions. This can greatly benefit filmmakers and video creators by improving the overall quality of their work without requiring extensive manual effort.
Abstract
A "match cut" is a common video editing technique where a pair of shots that have a similar composition transition fluidly from one to another. Although match cuts are often visual, certain match cuts involve the fluid transition of audio, where sounds from different sources merge into one indistinguishable transition between two shots. In this paper, we explore the ability to automatically find and create "audio match cuts" within videos and movies. We create a self-supervised audio representation for audio match cutting and develop a coarse-to-fine audio match pipeline that recommends matching shots and creates the blended audio. We further annotate a dataset for the proposed audio match cut task and compare the ability of multiple audio representations to find audio match cut candidates. Finally, we evaluate multiple methods to blend two matching audio candidates with the goal of creating a smooth transition. Project page and examples are available at: https://denfed.github.io/audiomatchcut/