OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Shikhar Bharadwaj, Samuele Cornell, Kwanghee Choi, Satoru Fukayama, Hye-jin Shim, Soham Deshmukh, Shinji Watanabe

2025-07-21

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Summary

This paper talks about OpenBEATs, a fully open-source audio encoder that improves on previous models by training on multiple types of sounds like music, environmental noises, and animal sounds to better understand audio in general.

What's the problem?

The problem is that earlier audio encoders like BEATs were limited because they were only trained on specific datasets, making them less effective for different types of audio tasks and harder to use widely.

What's the solution?

The authors extended BEATs by creating OpenBEATs, which uses a training process called multi-domain audio pre-training that helps the model learn from a wide variety of sounds. They tested it on many tasks and datasets, showing it performs better than much larger models while being smaller and more efficient.

Why it matters?

This matters because OpenBEATs provides a powerful, accessible tool for audio understanding across many applications like speech recognition, sound classification, and audio captioning, helping advance audio AI research and technology.

Abstract

OpenBEATs, an open-source framework extending BEATs with multi-domain audio pre-training, achieves state-of-the-art performance on various audio tasks with a smaller parameter size compared to larger models.

View Paper