IterPref: Focal Preference Learning for Code Generation via Iterative Debugging
Jie Wu, Haoling Li, Xin Zhang, Jianwen Luo, Yangyu Huang, Ruihang Chu, Yujiu Yang, Scarlett Li
2025-03-05

Summary
This paper talks about IterPref, a new way to make AI models better at writing code by teaching them to debug like humans do
What's the problem?
Current methods for improving AI code generators don't focus on specific errors in the code. Instead, they just look at whether the whole program passes or fails tests. This means the AI doesn't learn how to fix specific mistakes, which limits how much it can improve
What's the solution?
The researchers created IterPref, which teaches AI to debug code step by step, just like a human programmer would. It finds the exact parts of the code that have errors and teaches the AI how to fix them. They also made a new dataset called CodeFlow, which shows how code gets fixed over time. This helps the AI learn from real debugging examples
Why it matters?
This matters because it could make AI code generators much better at writing correct code. By learning to find and fix specific errors, these AIs could become more reliable helpers for programmers. This could speed up software development and make it easier to create complex programs, potentially leading to faster and more innovative technology advancements
Abstract
Preference learning enhances Code LLMs beyond supervised fine-tuning by leveraging relative quality comparisons. Existing methods construct preference pairs from candidates based on test case success, treating the higher pass rate sample as positive and the lower as negative. However, this approach does not pinpoint specific errors in the code, which prevents the model from learning more informative error correction patterns, as aligning failing code as a whole lacks the granularity needed to capture meaningful error-resolution relationships. To address these issues, we propose IterPref, a new preference alignment framework that mimics human iterative debugging to refine Code LLMs. IterPref explicitly locates error regions and aligns the corresponding tokens via a tailored DPO algorithm. To generate informative pairs, we introduce the CodeFlow dataset, where samples are iteratively refined until passing tests, with modifications capturing error corrections. Extensive experiments show that a diverse suite of Code LLMs equipped with IterPref achieves significant performance gains in code generation and improves on challenging tasks like BigCodeBench. In-depth analysis reveals that IterPref yields fewer errors. Our code and data will be made publicaly available.