AudioBERT: Audio Knowledge Augmented Language Model
Hyunjong Ok, Suho Yoo, Jaeho Lee
2024-09-17

Summary
This paper introduces AudioBERT, a new model that improves language models by adding auditory knowledge, helping them understand sounds better.
What's the problem?
Language models like BERT are great at processing text but often lack basic knowledge about sounds, such as what different animals sound like. This limits their ability to understand and generate content related to auditory information.
What's the solution?
To tackle this problem, the researchers created a new dataset called AuditoryBench to evaluate how well language models understand sounds. They then developed AudioBERT, which enhances BERT by injecting auditory knowledge into it. This is done by detecting when sound-related information is needed in prompts and retrieving relevant audio data to improve the model's understanding. Their experiments showed that AudioBERT performs significantly better than previous models on tasks related to auditory knowledge.
Why it matters?
This research is important because it helps bridge the gap between text understanding and auditory knowledge in AI. By improving how language models process sounds, AudioBERT can enhance applications in fields like education, accessibility, and entertainment, making AI more effective in real-world scenarios where sound plays a crucial role.
Abstract
Recent studies have identified that language models, pretrained on text-only datasets, often lack elementary visual knowledge, e.g., colors of everyday objects. Motivated by this observation, we ask whether a similar shortcoming exists in terms of the auditory knowledge. To answer this question, we construct a new dataset called AuditoryBench, which consists of two novel tasks for evaluating auditory knowledge. Based on our analysis using the benchmark, we find that language models also suffer from a severe lack of auditory knowledge. To address this limitation, we propose AudioBERT, a novel method to augment the auditory knowledge of BERT through a retrieval-based approach. First, we detect auditory knowledge spans in prompts to query our retrieval model efficiently. Then, we inject audio knowledge into BERT and switch on low-rank adaptation for effective adaptation when audio knowledge is required. Our experiments demonstrate that AudioBERT is quite effective, achieving superior performance on the AuditoryBench. The dataset and code are available at https://github.com/HJ-Ok/AudioBERT.