< Explain other AI papers

Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

Guokan Shang, Hadi Abdine, Ahmad Chamma, Amr Mohamed, Mohamed Anwar, Abdelaziz Bounhar, Omar El Herraoui, Preslav Nakov, Michalis Vazirgiannis, Eric Xing

2025-07-09

Nile-Chat: Egyptian Language Models for Arabic and Latin Scripts

Summary

This paper talks about Nile-Chat, a group of large language models designed specifically for the Egyptian Arabic dialect. These models can understand and generate text written in both the Arabic script and the Latin script, which is commonly used in digital communication among Egyptians.

What's the problem?

The problem is that most existing language models don’t handle Egyptian Arabic well, especially when it is written in both Arabic and Latin scripts. This makes it hard for AI to understand and communicate naturally with Egyptian speakers.

What's the solution?

The researchers developed Nile-Chat using a new training method called Branch-Train-MiX, which combines specialized parts of the model trained for each script into one powerful model. They trained these models on a large set of Egyptian Arabic text and tested them on tasks that require understanding and generating both scripts, showing strong performance compared to other models.

Why it matters?

This matters because Egyptian Arabic is spoken by over 100 million people, and supporting both scripts helps make AI more useful and accessible to a large community. It also pushes forward research in language models that can handle multiple scripts and dialects better.

Abstract

Nile-Chat models, using a Branch-Train-MiX strategy, outperform existing multilingual and Arabic LLMs on Egyptian dialect benchmarks, particularly in dual-script understanding and generation.