SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

Zexiong Ma, Chao Peng, Pengfei Gao, Xiangxin Meng, Yanzhen Zou, Bing Xie

2025-02-28

SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning

Summary

This paper talks about SoRFT, a new way to train AI models to fix problems in computer code more effectively and cheaply than current methods. It breaks down the task of fixing code issues into smaller steps and uses a special learning process to make the AI better at each step.

What's the problem?

Right now, the best tools for fixing code problems use expensive commercial AI models, which can be costly and raise privacy concerns. The current ways of training AI for this task don't work well on new, unfamiliar problems and don't make good use of freely available open-source code resources.

What's the solution?

The researchers created SoRFT, which stands for Subtask-oriented Reinforced Fine-Tuning. This method breaks down the process of fixing code into four smaller tasks: finding the right file, locating the specific function, pinpointing the exact lines that need changing, and writing the correct code fix. SoRFT then uses a two-step training process. First, it filters and uses high-quality examples to teach the AI the basics. Then, it uses a reward system to further improve the AI's performance on each subtask.

Why it matters?

This matters because SoRFT could make it much cheaper and more efficient to use AI for fixing code problems. It performs better than other open-source solutions and could be a good alternative to expensive commercial models. This could help developers and companies save money and time when fixing software issues, while also addressing privacy concerns associated with using commercial AI models.

Abstract

Mainstream issue-resolving frameworks predominantly rely on commercial models, leading to high costs and privacy concerns. Existing training approaches for issue resolving struggle with poor generalization and fail to fully leverage open-source development resources. We propose Subtask-oriented Reinforced Fine-Tuning (SoRFT), a novel training approach to enhance the issue resolving capability of LLMs. We decomposes issue resolving into structured subtasks: file localization, function localization, line localization, and code edit generation. SoRFT consists of two training stages: (1) rejection-sampled supervised fine-tuning, Chain of Thought (CoT) data is filtered using ground-truth before fine-tuning the LLM, and (2) rule-based reinforcement learning, which leverages PPO with ground-truth based rewards. We evaluate the SoRFT-trained model on SWE-Bench Verified and SWE-Bench Lite, achieving state-of-the-art (SOTA) performance among open-source models (e.g., resolve 21.4% issues on SWE-Bench Verified with SoRFT-Qwen-7B). The experimental results demonstrate that SoRFT significantly enhances issue-resolving performance, improves model generalization, and provides a cost-efficient alternative to commercial models.

View Paper