NeoBabel: A Multilingual Open Tower for Visual Generation
Mohammad Mahdi Derakhshani, Dheeraj Varghese, Marzieh Fadaee, Cees G. M. Snoek
2025-07-09
Summary
This paper talks about NeoBabel, a new AI system that can generate images from text in multiple languages like English, Chinese, Dutch, French, Hindi, and Persian. It is designed to work well across these languages while being efficient and smaller than many other models.
What's the problem?
The problem is that most text-to-image models focus mainly on English, which makes it hard for people who speak other languages to use these tools effectively. Using translation tools can cause mistakes and miss cultural details in images.
What's the solution?
The researchers trained NeoBabel directly on a large and diverse collection of text-image pairs in multiple languages, allowing it to understand and generate images based on descriptions in all supported languages. They also created new ways to test the model’s performance and how well it handles mixed languages in prompts.
Why it matters?
This matters because it helps make AI image generation more inclusive and culturally accurate, enabling more people worldwide to use these tools in their own languages and helping reduce digital inequalities.
Abstract
NeoBabel, a multilingual image generation framework, achieves state-of-the-art performance across six languages while maintaining efficiency and robustness, outperforming existing multilingual models.