GitChameleon: Evaluating AI Code Generation Against Python Library Version Incompatibilities

Diganta Misra, Nizar Islah, Victor May, Brice Rauby, Zihan Wang, Justine Gehring, Antonio Orvieto, Muawiz Chaudhary, Eilif B. Muller, Irina Rish, Samira Ebrahimi Kahou, Massimo Caccia

2025-07-17

GitChameleon: Evaluating AI Code Generation Against Python Library
Version Incompatibilities

Summary

This paper talks about GitChameleon, a new dataset designed to test how well AI models can generate Python code that works correctly with specific versions of software libraries.

What's the problem?

The problem is that software libraries often change over time, and AI models struggle to write code that matches the exact library version being used, which can cause errors or bugs in real-world programming.

What's the solution?

The authors created a carefully collected set of 328 coding problems, each linked to a specific library version and accompanied by tests that run the code to check if it works properly. They used this dataset to evaluate various AI code generators and showed that even the best models have a hard time handling version-specific requirements correctly.

Why it matters?

This matters because being able to generate code that works with the right library version is important for reliable software development, especially in professional environments where updating libraries frequently is not always possible.

Abstract

GitChameleon is a dataset for evaluating version-conditioned code generation by large language models, LLM-powered agents, code assistants, and RAG systems using execution-based tests.

View Paper