AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Xilin Jiang, Sukru Samet Dindar, Vishal Choudhari, Stephan Bickel, Ashesh Mehta, Guy M McKhann, Adeen Flinker, Daniel Friedman, Nima Mesgarani

2025-02-26

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding

Summary

This paper talks about a new AI system called AAD-LLM that can understand which speaker a person is paying attention to in a noisy environment, using brain signals to improve how it processes and responds to audio

What's the problem?

Current AI models that work with audio treat all sounds equally, but humans naturally focus on specific speakers and ignore others. This mismatch means AI can't accurately understand what a person is really listening to in complex sound environments

What's the solution?

The researchers created AAD-LLM, which uses recordings of brain activity to figure out which speaker a person is focusing on. It then uses this information to guide how it processes the audio and generates responses. They tested AAD-LLM on tasks like describing speakers, transcribing speech, and answering questions about conversations with multiple speakers

Why it matters?

This matters because it's a step towards AI that can understand and respond to audio more like humans do. It could lead to better hearing aids, smarter virtual assistants, and other technologies that need to understand what people are actually paying attention to in noisy environments. This approach could make AI more helpful and natural to interact with in real-world situations where there's lots of background noise or multiple people talking

Abstract

Auditory foundation models, including auditory large language models (LLMs), process all sound inputs equally, independent of listener perception. However, human <PRE_TAG>auditory perception</POST_TAG> is inherently selective: listeners focus on specific speakers while ignoring others in complex auditory scenes. Existing models do not incorporate this selectivity, limiting their ability to generate perception-aligned responses. To address this, we introduce Intention-Informed Auditory Scene Understanding (II-ASU) and present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM extends an auditory LLM by incorporating intracranial electroencephalography (iEEG) recordings to decode which speaker a listener is attending to and refine responses accordingly. The model first predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios, with both objective and subjective ratings showing improved alignment with listener intention. By taking a first step toward intention-aware auditory AI, this work explores a new paradigm where listener perception informs machine listening, paving the way for future listener-centered auditory systems. Demo and code available: https://aad-llm.github.io.

View Paper