Improving Inference-Time Optimisation for Vocal Effects Style Transfer with a Gaussian Prior
Chin-Yun Yu, Marco A. Martínez-Ramírez, Junghyun Koo, Wei-Hsiang Liao, Yuki Mitsufuji, György Fazekas
2025-05-19
Summary
This paper talks about a new way to make computer programs better at copying the sound style of one singer or audio track onto another, especially when it comes to adding vocal effects.
What's the problem?
The problem is that when trying to transfer vocal effects from one recording to another, the computer sometimes doesn't get the settings right, so the new audio doesn't match the style of the original as closely as it should.
What's the solution?
To solve this, the researchers used information from a large collection of vocal effect settings, known as a Gaussian prior, to guide the computer in picking better effect settings during the transfer process. This helps the program make the new audio sound more like the reference style.
Why it matters?
This matters because it can help musicians, producers, and even hobbyists create more realistic and high-quality vocal mixes, making it easier to experiment with different sounds and styles in music production.
Abstract
Incorporating Gaussian prior knowledge derived from a vocal preset dataset enhances audio effects transfer by improving parameter accuracy and matching reference styles more effectively than existing methods.