A comprehensive TTS benchmark, EmergentTTS-Eval, automates test-case generation and evaluation using LLMs and LALM to assess nuanced and semantically complex text in speech outputs.

This paper talks about EmergentTTS-Eval, a new way to test how well text-to-speech (TTS) systems can handle tricky and expressive speech, using AI models to both create and judge the test cases automatically.

EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract