Key Features

Generates expressive TTS with prompt-controlled emotion and delivery.
Supports optional 10-second voice references for voice cloning.
Controls laughs, sighs, breaths, pauses, transitions, and performance style.
Built as an IC-LoRA fine-tune of the LTX-2.3 3.3B audio-only model.
Uses a diffusion transformer with flow matching for audio generation.
Conditions generation on Gemma 3 12B text embeddings.
Provides Hugging Face model, demo Space, and GitHub code resources.
Targets audio drama, games, animation, character speech, and expressive assistants.

The model is an IC-LoRA fine-tune of the LTX-2.3 3.3B audio-only model, using a diffusion transformer with flow matching and conditioning from Gemma 3 12B text embeddings. This architecture allows the model to operate beyond conventional neutral TTS by producing expressive performance cues and dramatic shifts. It is built on LTX-2 under the LTX-2 Community License and includes model, demo space, and code links.


Dramabox is useful for audio drama, games, animation, expressive voice assistants, character prototyping, and synthetic dialogue datasets. Its main value is controllability: a user can write a prompt that specifies not only what is said but how it is performed. Because the Hugging Face page exposes model resources and code links, this listing marks it as a free open-source audio model.

Get more likes & reach the top of search results by adding this button on your site!

Embed button preview - Light theme
Embed button preview - Dark theme
TurboType Banner
Zero to AI Engineer Program

Zero to AI Engineer

Skip the degree. Learn real-world AI skills used by AI researchers and engineers. Get certified in 8 weeks or less. No experience required.

Subscribe to the AI Search Newsletter

Get top updates in AI to your inbox every weekend. It's free!