Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Marco Gaido, Sara Papi, Mauro Cettolo, Matteo Negri, Luisa Bentivogli

2025-12-24

Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems

Summary

This paper introduces a new tool called simulstream, which is designed to help researchers build and test systems that translate spoken language into text in real-time, as the speech is happening.

What's the problem?

Currently, evaluating these 'streaming speech-to-text translation' systems is difficult because the standard testing resources are outdated and limited. The old system, SimulEval, isn't maintained anymore and can't handle systems that improve their translations over time. It also struggles with longer recordings and doesn't easily allow for public demonstrations of the technology.

What's the solution?

The researchers created simulstream, an open-source framework that addresses these issues. It's built to handle long audio streams, supports both initial translations and systems that refine their work, and includes a web interface for showcasing these systems. This allows for a more comprehensive and realistic evaluation of streaming translation technologies.

Why it matters?

This new tool is important because it provides a standardized and modern way to develop and assess real-time speech translation. By making it easier to build, test, and demonstrate these systems, simulstream can help accelerate progress in this field, leading to better and more accessible translation technology.

Abstract

Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance partial-information decision-making with high translation quality. Research efforts on the topic have so far relied on the SimulEval repository, which is no longer maintained and does not support systems that revise their outputs. In addition, it has been designed for simulating the processing of short segments, rather than long-form audio streams, and it does not provide an easy method to showcase systems in a demo. As a solution, we introduce simulstream, the first open-source framework dedicated to unified evaluation and demonstration of StreamST systems. Designed for long-form speech processing, it supports not only incremental decoding approaches, but also re-translation methods, enabling for their comparison within the same framework both in terms of quality and latency. In addition, it also offers an interactive web interface to demo any system built within the tool.

View Paper