Stable Audio Open is a cutting-edge text-to-audio model developed by Stability AI, designed to generate high-quality stereo audio at 44.1kHz from text prompts. This open-weights mo

Stable Audio Open | Best AI for Audio | Find AI Tools & Apps

Stable Audio Open is a cutting-edge text-to-audio model developed by Stability AI, designed to generate high-quality stereo audio at 44.1kHz from text prompts. This open-weights model is trained using Creative Commons data and is accessible for both academic and artistic use cases. The model leverages an autoencoder, a T5-based text embedding for conditioning, and a transformer-based diffusion model, allowing it to produce realistic sounds and field recordings. The Stable Audio Open model weights are available on Hugging Face, and it is released under the Stability AI Community License, which permits non-commercial use and commercial use for individuals or organizations with up to $1 million in annual revenue. Key Features<ul><li>High-Quality Audio Generation:&nbsp;Produces stereo audio at 44.1kHz, up to 47 seconds in length.</li><li>Open-Weights Model:&nbsp;Accessible on Hugging Face for community use.</li><li>Advanced Architecture:&nbsp;Utilizes an autoencoder, T5-based text embedding, and a transformer-based diffusion model.</li><li>Creative Commons Data:&nbsp;Trained on nearly 500,000 recordings from Freesound and the Free Music Archive.</li><li>Flexible Use Cases:&nbsp;Suitable for sound design, ambient sounds, sample creation, audio branding, and academic projects.</li><li>Consumer-Grade Hardware:&nbsp;Runs efficiently on consumer-grade GPUs, such as A6000 GPUs for local training.</li><li>Customizable:&nbsp;Can be fine-tuned to meet specific needs in various industries and creative projects.</li></ul>

Stable Audio Open

Zero to AI Engineer

Subscribe to the AI Search Newsletter