FireRedTTS2

Free Speech Text-to-Speech

LikeWebsite Promote

Key Features

Long-form streaming TTS system for multi-speaker dialogue generation

Supports multiple languages including English, Chinese, Japanese, Korean, French, German, and Russian

Ultra-low latency with 12.5Hz streaming speech tokenizer

Dual-transformer architecture for flexible sentence-by-sentence generation

High similarity and low WER/CER in both monologue and dialogue tests

Random timbre generation for creating ASR/speech interaction data

Zero-shot voice cloning for cross-lingual and code-switching scenarios

Web UI tool for easy dialogue generation

The system features ultra-low latency, building on the new 12.5Hz streaming speech tokenizer, and employs a dual-transformer architecture that operates on a text–speech interleaved sequence, enabling flexible sentence-by-sentence generation and reducing first-packet latency. Specifically, on an L20 GPU, the first-packet latency is as low as 140ms while maintaining high-quality audio output. The system also achieves high similarity and low WER/CER in both monologue and dialogue tests.

FireRedTTS-2 is useful for creating ASR/speech interaction data and features random timbre generation. The system can be used for various applications such as podcast generation, chatbot development, and language learning. The system also supports zero-shot voice cloning for cross-lingual and code-switching scenarios. Additionally, the system provides a web UI tool for easy dialogue generation and supports both voice cloning and randomized voices.

Get more likes & reach the top of search results by adding this button on your site!

FireRedTTS2

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter