Higgs Audio V2

Free Audio Speech Generation

LikeWebsite Promote

Key Features

Multi-speaker conversations

Long form audio generation

High Fidelity Audio

Resource Efficient inference

Leading performance in generating lifelike and emotionally competent voice

Open source

Trained on over 10M hours of audio data

Adopts an innovative Dual-FFN architecture

Higgs Audio V2 represents a significant leap forward in audio AI capabilities. It allows for multi-speaker conversations, long form audio generation, and high fidelity audio. The model is trained on a massive self-annotated corpus of over 10M hours of audio data, using BosonAI's ASR, and LLM models. Higgs Audio V2 adopts an innovative Dual-FFN architecture that is capable of handling text and audio tokens jointly. Moreover, the tokenizer has dedicated representations for both semantic and acoustic aspects of the audio.

Higgs Audio V2 is now open source, making it the first open-source, large-scale audio model that excels at multi-speaker, lifelike and emotionally competent voice generation. It opens doors for developers, creatives, and researchers to build conversational agents, audiobooks, podcasts, and more with lifelike performance. Higgs Audio V2 has achieved state-of-art performance, beating gpt-4o-mini-tts with 75.7% win rate on Emotions and 55.7% on Questions in EmergentTTS-Eval. The model is available for cloning on GitHub, and can also be tried out through the online demo or HuggingFace Space.

Get more likes & reach the top of search results by adding this button on your site!

Higgs Audio V2

Key Features

Subscribe to the AI Search Newsletter