Optimizing Multilingual Text-To-Speech with Accents & Emotions

Pranav Pawar, Akshansh Dwivedi, Jenish Boricha, Himanshu Gohil, Aditya Dubey

2025-06-23

Optimizing Multilingual Text-To-Speech with Accents & Emotions

Summary

This paper talks about a new text-to-speech (TTS) system that improves how AI speaks with accurate accents and emotions for Hindi and Indian English by using special techniques to understand sounds, cultural emotions, and switch accents smoothly.

What's the problem?

The problem is that most current TTS systems have trouble making speech sound natural with the right accents and emotions when switching between languages like Hindi and Indian English, which makes it hard for users to connect with the voice.

What's the solution?

The researchers built a system that aligns sounds precisely, includes emotion information that reflects cultural differences, and allows the voice to change accents smoothly in one sentence, so it can mix languages and emotions naturally.

Why it matters?

This matters because it helps create more realistic and expressive AI voices that respect cultural nuances, making tools like virtual assistants and educational software more effective and relatable for speakers of different languages.

Abstract

A new TTS architecture improves accent accuracy and emotion recognition for Hindi and Indian English by integrating phoneme alignment, culture-sensitive emotion embeddings, and dynamic accent code switching.

View Paper