FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian

Sara Papi, Marco Gaido, Luisa Bentivogli, Alessio Brutti, Mauro Cettolo, Roberto Gretter, Marco Matassoni, Mohamed Nabih, Matteo Negri

2025-05-30

FAMA: The First Large-Scale Open-Science Speech Foundation Model for
English and Italian

Summary

This paper talks about FAMA, which is the first large-scale, open science speech model for English and Italian. It was built using open-source data and code, making it transparent and easy for anyone to use or study.

What's the problem?

The problem is that most powerful speech models, like those used for recognizing or translating speech, are closed off. Their training data and code aren't shared, which makes it hard for other researchers to check results, improve the models, or use them fairly in new projects.

What's the solution?

The researchers created FAMA by collecting over 150,000 hours of open-source speech data and building models that are not only high-performing but also fully open. They released everything—models, code, and datasets—under open licenses, so anyone can use or improve them. FAMA also comes with a new clean dataset for English and Italian speech.

Why it matters?

This is important because it helps make speech technology more fair, transparent, and accessible. With FAMA, researchers and developers can better understand how these models work, compare results honestly, and build new applications without running into hidden barriers.

Abstract

FAMA, an open science family of speech foundation models, provides transparency and competitive performance by leveraging open-source training data and code.

View Paper