Key Features

Multi-speaker conversations
Long form audio generation
High Fidelity Audio
Resource Efficient inference
Leading performance in generating lifelike and emotionally competent voice
Open source
Trained on over 10M hours of audio data
Adopts an innovative Dual-FFN architecture

Higgs Audio V2 represents a significant leap forward in audio AI capabilities. It allows for multi-speaker conversations, long form audio generation, and high fidelity audio. The model is trained on a massive self-annotated corpus of over 10M hours of audio data, using BosonAI's ASR, and LLM models. Higgs Audio V2 adopts an innovative Dual-FFN architecture that is capable of handling text and audio tokens jointly. Moreover, the tokenizer has dedicated representations for both semantic and acoustic aspects of the audio.


Higgs Audio V2 is now open source, making it the first open-source, large-scale audio model that excels at multi-speaker, lifelike and emotionally competent voice generation. It opens doors for developers, creatives, and researchers to build conversational agents, audiobooks, podcasts, and more with lifelike performance. Higgs Audio V2 has achieved state-of-art performance, beating gpt-4o-mini-tts with 75.7% win rate on Emotions and 55.7% on Questions in EmergentTTS-Eval. The model is available for cloning on GitHub, and can also be tried out through the online demo or HuggingFace Space.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!