Proactive Hearing Assistants that Isolate Egocentric Conversations
Guilin Hu, Malek Itani, Tuochao Chen, Shyamnath Gollakota
2025-11-19
Summary
This research introduces a new type of hearing aid that automatically figures out who the wearer is talking to, without needing the wearer to tell it. It's designed to help people focus on the person they're having a conversation with, even in noisy environments with multiple speakers.
What's the problem?
Traditional hearing aids often struggle in situations with multiple people talking at once. They usually require the user to manually select which speaker they want to focus on, which can be disruptive and difficult. The core issue is that current systems aren't good at proactively identifying and isolating the person the wearer is actively engaged with in a conversation.
What's the solution?
The researchers developed a system that uses two models working together. A fast, lightweight model constantly analyzes the sound to quickly identify potential conversation partners based on who is speaking when, using the wearer’s own voice as a starting point. A slower, more detailed model then steps in to understand the overall flow of the conversation and confirm who the wearer is actually talking to. This dual approach allows the system to react quickly while still understanding the bigger picture of the conversation. They tested this system with recordings from real people in real-world conversations.
Why it matters?
This work is a significant step towards creating truly intelligent hearing aids that can adapt to the user's environment and social interactions. By automatically identifying and focusing on the correct speaker, these 'proactive' hearing assistants could dramatically improve the listening experience for people with hearing loss, making conversations easier and more natural.
Abstract
We introduce proactive hearing assistants that automatically identify and separate the wearer's conversation partners, without requiring explicit prompts. Our system operates on egocentric binaural audio and uses the wearer's self-speech as an anchor, leveraging turn-taking behavior and dialogue dynamics to infer conversational partners and suppress others. To enable real-time, on-device operation, we propose a dual-model architecture: a lightweight streaming model runs every 12.5 ms for low-latency extraction of the conversation partners, while a slower model runs less frequently to capture longer-range conversational dynamics. Results on real-world 2- and 3-speaker conversation test sets, collected with binaural egocentric hardware from 11 participants totaling 6.8 hours, show generalization in identifying and isolating conversational partners in multi-conversation settings. Our work marks a step toward hearing assistants that adapt proactively to conversational dynamics and engagement. More information can be found on our website: https://proactivehearing.cs.washington.edu/