Key Features
- Produces stereo audio at 44.1kHz, up to 47 seconds in length.
- Accessible on Hugging Face for community use.
- Utilizes an autoencoder, T5-based text embedding, and a transformer-based diffusion model.
- Trained on nearly 500,000 recordings from Freesound and the Free Music Archive.
- Suitable for sound design, ambient sounds, sample creation, audio branding, and academic projects.
- Runs efficiently on consumer-grade GPUs, such as A6000 GPUs for local training.
- Can be fine-tuned to meet specific needs in various industries and creative projects.