RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling

Mandip Goswami

2025-10-23

RIR-Mega: a large-scale simulated room impulse response dataset for machine learning and room acoustics modeling

Summary

This paper introduces RIR-Mega, a large dataset of simulated room impulse responses, which are essentially recordings of how sound travels in a space.

What's the problem?

Analyzing and manipulating sound in different environments, like removing echoes or figuring out where a sound came from, requires a lot of data about how sound behaves in those environments. Getting real-world recordings of these 'room impulse responses' is difficult and time-consuming, and existing datasets are often small or hard to use with modern machine learning tools.

What's the solution?

The researchers created a huge collection of 50,000 simulated room impulse responses. They made it easy to access and use by providing it with a clear description format that computers can understand, tools to check its quality, and even a simple machine learning model as a starting point for others. They also made a smaller portion available for quick testing and the full dataset available for more in-depth research.

Why it matters?

This dataset is important because it provides a standardized and readily available resource for researchers working on things like improving speech recognition in noisy rooms, developing better hearing aids, or creating more realistic audio effects. The fact that it's simulated means it's perfectly controlled and labeled, which is a huge advantage for training and testing new algorithms.

Abstract

Room impulse responses are a core resource for dereverberation, robust speech recognition, source localization, and room acoustics estimation. We present RIR-Mega, a large collection of simulated RIRs described by a compact, machine friendly metadata schema and distributed with simple tools for validation and reuse. The dataset ships with a Hugging Face Datasets loader, scripts for metadata checks and checksums, and a reference regression baseline that predicts RT60 like targets from waveforms. On a train and validation split of 36,000 and 4,000 examples, a small Random Forest on lightweight time and spectral features reaches a mean absolute error near 0.013 s and a root mean square error near 0.022 s. We host a subset with 1,000 linear array RIRs and 3,000 circular array RIRs on Hugging Face for streaming and quick tests, and preserve the complete 50,000 RIR archive on Zenodo. The dataset and code are public to support reproducible studies.

View Paper