ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Ahmed Heakl, Youssef Zaghloul, Mennatullah Ali, Rania Hossam, Walid Gomaa

2024-06-28

ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs

Summary

This paper talks about ArzEn-LLM, a system designed to improve the translation and speech recognition of code-switched language, specifically between Egyptian Arabic and English. It focuses on how to effectively handle the mixing of these two languages in both written and spoken forms.

What's the problem?

Code-switching, which is when speakers mix two languages in conversation, is common among people who speak both Egyptian Arabic and English. However, existing translation and speech recognition systems often struggle to accurately process this type of language use. This can lead to misunderstandings and ineffective communication, especially in important settings like business or education.

What's the solution?

To solve this problem, the authors developed ArzEn-LLM, which uses advanced machine translation (MT) and automatic speech recognition (ASR) techniques. They employed large language models like Llama and Gemma for translation tasks and the Whisper model for recognizing spoken code-switched Egyptian Arabic. Their system integrates ASR with MT to create a smooth transition from speech to text and back to speech, improving how well the system understands and translates mixed-language inputs. The authors conducted experiments that showed significant improvements in translation accuracy, achieving a 56% better performance in English translations compared to previous methods and a 9.3% improvement for Arabic translations.

Why it matters?

This research is important because it addresses a real-world communication challenge faced by many bilingual speakers. By improving how technology understands and translates code-switched language, ArzEn-LLM can enhance interactions in various fields such as business negotiations, cultural exchanges, and academic discussions. This capability not only makes communication more effective but also supports the preservation of linguistic diversity.

Abstract

Motivated by the widespread increase in the phenomenon of code-switching between Egyptian Arabic and English in recent times, this paper explores the intricacies of machine translation (MT) and automatic speech recognition (ASR) systems, focusing on translating code-switched Egyptian Arabic-English to either English or Egyptian Arabic. Our goal is to present the methodologies employed in developing these systems, utilizing large language models such as LLama and Gemma. In the field of ASR, we explore the utilization of the Whisper model for code-switched Egyptian Arabic recognition, detailing our experimental procedures including data preprocessing and training techniques. Through the implementation of a consecutive speech-to-text translation system that integrates ASR with MT, we aim to overcome challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect. Evaluation against established metrics showcases promising results, with our methodologies yielding a significant improvement of 56% in English translation over the state-of-the-art and 9.3% in Arabic translation. Since code-switching is deeply inherent in spoken languages, it is crucial that ASR systems can effectively handle this phenomenon. This capability is crucial for enabling seamless interaction in various domains, including business negotiations, cultural exchanges, and academic discourse. Our models and code are available as open-source resources. Code: http://github.com/ahmedheakl/arazn-llm}, Models: http://huggingface.co/collections/ahmedheakl/arazn-llm-662ceaf12777656607b9524e.

View Paper