Beyond Transcription: Mechanistic Interpretability in ASR

Neta Glazer, Yael Segal-Feldman, Hilit Segev, Aviv Shamsian, Asaf Buchnick, Gill Hetz, Ethan Fetaya, Joseph Keshet, Aviv Navon

2025-08-28

Beyond Transcription: Mechanistic Interpretability in ASR

Summary

This paper explores how we can understand what's happening *inside* automatic speech recognition (ASR) systems, which are the programs that convert speech to text, by using tools designed to make 'black box' AI models more understandable.

What's the problem?

While there's a lot of work being done to understand how large language models (like those powering chatbots) make decisions, these same techniques haven't been widely used in speech recognition. This is a problem because it makes it hard to figure out *why* ASR systems make mistakes, like repeating themselves or misinterpreting what's said, and how to fix them.

What's the solution?

The researchers took existing methods for understanding AI models – things like 'logit lens' which looks at the model's raw outputs, 'linear probing' which tests what information each part of the model holds, and 'activation patching' which changes parts of the model to see what happens – and applied them to ASR systems. They then carefully analyzed how information about sound and meaning changed as it moved through the different layers of the ASR system.

Why it matters?

By using these techniques, the researchers discovered new details about how ASR systems work internally, specifically identifying how certain interactions between parts of the system cause repeating errors and how biases get built into the way the system understands sounds. This shows that using these 'interpretability' tools can help us build better, more reliable, and more transparent speech recognition systems.

Abstract

Interpretability methods have recently gained significant attention, particularly in the context of large language models, enabling insights into linguistic representations, error detection, and model behaviors such as hallucinations and repetitions. However, these techniques remain underexplored in automatic speech recognition (ASR), despite their potential to advance both the performance and interpretability of ASR systems. In this work, we adapt and systematically apply established interpretability methods such as logit lens, linear probing, and activation patching, to examine how acoustic and semantic information evolves across layers in ASR systems. Our experiments reveal previously unknown internal dynamics, including specific encoder-decoder interactions responsible for repetition hallucinations and semantic biases encoded deep within acoustic representations. These insights demonstrate the benefits of extending and applying interpretability techniques to speech recognition, opening promising directions for future research on improving model transparency and robustness.

View Paper