OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution
Lianghong Guo, Wei Tao, Runhan Jiang, Yanlin Wang, Jiachi Chen, Xilin Liu, Yuchi Ma, Mingzhi Mao, Hongyu Zhang, Zibin Zheng
2025-05-08
Summary
This paper talks about OmniGIRL, which is a new test designed to see how well AI models can help solve problems on GitHub, a popular site where people work together on coding projects. OmniGIRL is special because it covers many languages, uses both text and images, and includes different types of projects.
What's the problem?
The problem is that current large language models aren't very good at handling the wide variety of issues that come up on GitHub, especially when those issues include images or are written in different languages. This makes it hard for these AI models to be truly helpful for real-world software development.
What's the solution?
The researchers created the OmniGIRL benchmark, which is a collection of real GitHub problems that include both words and pictures, and come from many different areas and languages. They used this benchmark to test how well existing AI models perform and found that the models often struggle, especially when images are involved.
Why it matters?
This matters because software development is a global, multimedia activity, and having AI that can understand and help with all kinds of issues would make teamwork and problem-solving much easier. By showing where current models fall short, this research helps guide improvements so future AI can be more useful for developers everywhere.
Abstract
OmniGIRL, a multilingual, multimodal, and multi-domain GitHub issue resolution benchmark, demonstrates that current LLMs perform poorly, particularly in issues involving images.