Fish Audio S2

NEW

Freemium TTS Open-Source

LikeWebsite Promote

Key Features

Open-source expressive TTS release with public inference code and model weights.

Uses the S2 Pro model for realistic multilingual speech synthesis.

Supports 80+ languages, with highest-quality tiers for Japanese, English, and Chinese.

Provides fine-grained inline control through natural-language bracket tags.

Supports more than 15,000 expressive tags, including pauses, whispers, laughter, singing, and emphasis.

Supports multi-speaker generation with speaker control tokens.

Uses a Dual-AR architecture with 4B Slow AR and 400M Fast AR components.

Provides SGLang-based streaming inference with reported ~100 ms TTFA on H200-class serving.

S2 Pro uses a Dual-Autoregressive architecture with a 4B-parameter Slow AR component for semantic prediction and a 400M-parameter Fast AR component for acoustic detail. Fish Audio reports training on more than 10 million hours of audio, support for 80+ languages, over 15,000 natural-language control tags, and an SGLang-based streaming inference engine.

Fish Audio S2 is useful for researchers, developers, and creative voice teams that want more control than a fixed voice preset library. The release includes inference code, model weights, fine-tuning support, and self-hosting paths for teams that can operate GPU infrastructure, while commercial use requires a separate Fish Audio license.

Get more likes & reach the top of search results by adding this button on your site!

Fish Audio S2

Key Features

Zero to AI Engineer

Subscribe to the AI Search Newsletter