Sadeed: Advancing Arabic Diacritization Through Small Language Model
Zeina Aldallal, Sara Chrouf, Khalil Hennara, Mohamed Motaism Hamed, Muhammad Hreden, Safwan AlModhayan
2025-05-01
Summary
This paper talks about Sadeed, a small but powerful AI model that helps add the correct vowel marks to Arabic text, which is important for understanding and pronouncing words correctly.
What's the problem?
Arabic writing usually leaves out vowel marks, making it hard to read or pronounce words, especially for learners or in situations where the meaning could be confusing. Existing AI tools for this task often need lots of resources and don't always work well.
What's the solution?
The researchers created Sadeed, a lightweight language model that was specially trained to add these marks accurately, even when there isn't much training data available. They also introduced a new test set called SadeedDiac-25 to better measure how well the model works.
Why it matters?
This matters because it makes reading and learning Arabic easier for everyone, and it shows that you don't need huge, expensive AI systems to solve important language problems.
Abstract
Sadeed, a fine-tuned decoder-only language model, enhances Arabic text diacritization using limited resources and addresses benchmarking limitations with SadeedDiac-25.