Vision Language Models are Biased
An Vo, Khai-Nguyen Nguyen, Mohammad Reza Taesiri, Vy Tuong Dang, Anh Totti Nguyen, Daeyoung Kim
2025-06-02
Summary
This paper talks about how vision language models, which are AI systems that understand both images and text, show clear biases when they have to count objects or identify things in pictures.
What's the problem?
The problem is that these AI models often make mistakes or show unfair preferences in their answers, especially when it comes to tasks like counting how many objects are in a picture or figuring out what something is, and these mistakes don't go away even if you give the model more instructions or context.
What's the solution?
The researchers tested these models on different counting and identification tasks and found that the biases are strong and persistent. They showed that simply giving the models more information or clearer instructions doesn't fix the problem.
Why it matters?
This is important because it means that AI systems we rely on for understanding images and text might not always be fair or accurate, which could cause problems in real-world applications like security, healthcare, or education. Recognizing and fixing these biases is necessary to make AI more trustworthy and useful.
Abstract
Vision language models exhibit strong biases in counting and identification tasks, demonstrating a failure mode that persist even with additional instructions or context.