MuCodec: Ultra Low-Bitrate Music Codec

Yaoxun Xu, Hangting Chen, Jianwei Yu, Wei Tan, Rongzhi Gu, Shun Lei, Zhiwei Lin, Zhiyong Wu

2024-09-23

Summary

This paper introduces MuCodec, an innovative music codec designed to compress and reconstruct music files at extremely low bitrates while maintaining high sound quality. It focuses on making music files smaller without losing the richness of the audio.

What's the problem?

Compressing music files is important for efficient transmission and storage, but traditional methods struggle to maintain sound quality, especially with complex music that includes vocals and backgrounds. Many existing codecs either lose too much detail or require higher bitrates, which is not ideal for situations where bandwidth is limited.

What's the solution?

To solve this problem, the researchers developed MuCodec, which uses a unique approach to extract both acoustic (sound-related) and semantic (meaning-related) features from music. It employs a technique called flow-matching to ensure that the compressed music retains its original feel and dynamics. MuCodec can effectively reconstruct high-quality music at very low bitrates (as low as 0.35 kbps) and also at higher bitrates (up to 1.35 kbps), achieving better results than previous codecs in terms of sound quality.

Why it matters?

This research is significant because it enables better music compression for streaming services and applications where data usage is a concern. By allowing high-quality music to be transmitted at lower bitrates, MuCodec can improve user experiences in areas like online music streaming, mobile applications, and any situation where bandwidth is limited.

Abstract

Music codecs are a vital aspect of audio codec research, and ultra low-bitrate compression holds significant importance for music transmission and generation. Due to the complexity of music backgrounds and the richness of vocals, solely relying on modeling semantic or acoustic information cannot effectively reconstruct music with both vocals and backgrounds. To address this issue, we propose MuCodec, specifically targeting music compression and reconstruction tasks at ultra low bitrates. MuCodec employs MuEncoder to extract both acoustic and semantic features, discretizes them with RVQ, and obtains Mel-VAE features via flow-matching. The music is then reconstructed using a pre-trained MEL-VAE decoder and HiFi-GAN. MuCodec can reconstruct high-fidelity music at ultra low (0.35kbps) or high bitrates (1.35kbps), achieving the best results to date in both subjective and objective metrics. Code and Demo: https://xuyaoxun.github.io/MuCodec_demo/.

View Paper