Text to Speech

Discover and compare the best AI models for text to speech generation. Note: This is my personal non-scientific leaderboard. Models are ranked by the completion rate of a series of diverse prompts designed to thoroughly assess performance.

RankCompanyModelScore
Microsoft
88.68
OpenBMB
88.3
Bilibili Index
87.5
4
MiniMax
Speech-02-HD
87
5
Fish Audio
85
6
SWivid
83.95
7
RedNote
82.7
8
ElevenLabs
82.65
9
Resemble AI
Chatterbox
79.4
10
Boson AI
79.33
11
Zyphra
74
12
Kokoro
Kokoro 82M
71
13
Coqui
XTTS-v2
69.42

Full tutorial & review videos

Watch the videos below for comprehensive comparisons and detailed installation guides for select text-to-speech models.

Methodology

Models are ranked using a series of prompts involving diverse range of challenging tasks. This includes:

  • Naturalness and human-like quality
  • Pronunciation accuracy
  • Emotions and expressions
  • Open-source vs closed-source
  • Different accents and languages
  • Voice cloning consistency

To prevent manipulation, the prompts are kept confidential and are regularly updated to increase difficulty as models improve. Here is a subset of prompts for your reference:

The record producer refused to record the band’s new single.
The wind was too strong to wind the kite string around the spool.
Are you serious? No, I’m joking—seriously, I’m not serious!
She sells seashells by the seashore, but the shells she sells aren’t cheap.
The Dr. who lives at 1234 St. Dr. prescribed 2 tsp. of medicine for Feb. 2, 2023.
The fiesta was très magnifique, with 你好 greetings and English pop music.