MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification
Xabier de Zuazo, Ibon Saratxaga, Eva Navas
2025-12-02
Summary
This paper details a system built for a competition focused on understanding brain activity related to speech, specifically using a technology called magnetoencephalography, or MEG. The researchers aimed to detect when someone is speaking and to identify the specific sounds, or phonemes, being made.
What's the problem?
Analyzing MEG data to understand speech is challenging because the signals are complex and can vary between different recordings. The competition tasks – speech detection and phoneme classification – required accurate models that could generalize well to unseen data. A key issue was dealing with differences in the data distributions between the training and testing sets, which can throw off a model's performance.
What's the solution?
The researchers used a type of neural network called a Conformer, which is good at processing sequential data like speech. They adapted this network to work directly with the raw MEG signals, making it more efficient. For detecting speech, they created a way to artificially increase the amount of training data specifically tailored for MEG signals. For phoneme classification, they adjusted how the model learned from different phonemes to account for some being less common, and they used a technique to group similar examples together. Finally, they found that normalizing the data at the individual recording level was crucial for making the model work well on the final test data.
Why it matters?
This work is important because it pushes the boundaries of what's possible in decoding speech directly from brain signals. Achieving high accuracy in speech detection and phoneme classification using MEG has implications for helping people who have lost the ability to speak, or for developing brain-computer interfaces that can translate thoughts into words. The techniques they developed, like the MEG-specific data augmentation and instance-level normalization, could be valuable for other researchers working with MEG data.
Abstract
We present Conformer-based decoders for the LibriBrain 2025 PNPL competition, targeting two foundational MEG tasks: Speech Detection and Phoneme Classification. Our approach adapts a compact Conformer to raw 306-channel MEG signals, with a lightweight convolutional projection layer and task-specific heads. For Speech Detection, a MEG-oriented SpecAugment provided a first exploration of MEG-specific augmentation. For Phoneme Classification, we used inverse-square-root class weighting and a dynamic grouping loader to handle 100-sample averaged examples. In addition, a simple instance-level normalization proved critical to mitigate distribution shifts on the holdout split. Using the official Standard track splits and F1-macro for model selection, our best systems achieved 88.9% (Speech) and 65.8% (Phoneme) on the leaderboard, surpassing the competition baselines and ranking within the top-10 in both tasks. For further implementation details, the technical documentation, source code, and checkpoints are available at https://github.com/neural2speech/libribrain-experiments.