Behind Maya: Building a Multilingual Vision Language Model

Nahid Alam, Karthik Reddy Kanjula, Surya Guthikonda, Timothy Chung, Bala Krishna S Vegesna, Abhipsha Das, Anthony Susevski, Ryan Sze-Yin Chan, S M Iftekhar Uddin, Shayekh Bin Islam, Roshan Santhosh, Snegha A, Drishti Sharma, Chen Liu, Isha Chaturvedi, Genta Indra Winata, Ashvanth. S, Snehanshu Mukherjee, Alham Fikri Aji

2025-05-15

Behind Maya: Building a Multilingual Vision Language Model

Summary

This paper talks about Maya, a new AI model that can understand both images and text in many different languages, making it better at handling tasks in languages that don't have a lot of data or resources.

What's the problem?

The problem is that most AI models that connect images and text are mainly trained on English or other popular languages, so they don't work as well for languages that don't have as much training data or for cultures that are less represented.

What's the solution?

The researchers built Maya by training it on a special dataset that includes images and text in many different languages and from a variety of cultural backgrounds. This helps the model get better at understanding and working with less common languages and different cultural contexts.

Why it matters?

This matters because it makes AI more fair and useful for people all around the world, not just those who speak major languages, and helps bridge gaps in technology access and understanding.

Abstract

Maya, a multilingual Vision-Language Model, enhances performance on low-resource languages and varied cultural contexts using a multilingual image-text pretraining dataset.

View Paper