SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Chaofan Tao, Jierun Chen, Yuxin Jiang, Kaiqi Kou, Shaowei Wang, Ruoyu Wang, Xiaohui Li, Sidi Yang, Yiming Du, Jianbo Dai, Zhiming Mao, Xinyu Wang, Lifeng Shang, Haoli Bai

2026-01-06

SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving

Summary

This paper introduces SWE-Lego, a new method for improving how well computer programs can automatically fix errors in software code, achieving top results compared to other publicly available models.

What's the problem?

Currently, getting these programs to reliably fix code often requires complicated training processes involving multiple steps and techniques. These methods can be difficult to set up and aren't always the most efficient way to improve performance. The challenge is to see how far a simpler, more straightforward approach can go.

What's the solution?

The researchers created SWE-Lego, which focuses on a single, refined step called supervised fine-tuning. This involves two main parts: a large, high-quality dataset of coding problems and solutions, and a smart training process that gradually increases the difficulty of the problems presented to the program while also masking errors to help it learn better. They also improved how the program performs during testing by using a 'verifier' to check its work and refine its answers.

Why it matters?

This work is important because it demonstrates that you don't always need extremely complex training methods to achieve state-of-the-art results in software repair. By showing that a simpler approach can be so effective, SWE-Lego makes it easier for others to build and improve these kinds of automated code-fixing tools, potentially leading to faster and more reliable software development.

Abstract

We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods that rely on complex training paradigms (e.g., mid-training, SFT, reinforcement learning, and their combinations), we explore how to push the limits of a lightweight SFT-only approach for SWE tasks. SWE-Lego comprises three core building blocks, with key findings summarized as follows: 1) the SWE-Lego dataset, a collection of 32k highquality task instances and 18k validated trajectories, combining real and synthetic data to complement each other in both quality and quantity; 2) a refined SFT procedure with error masking and a difficulty-based curriculum, which demonstrably improves action quality and overall performance. Empirical results show that with these two building bricks alone,the SFT can push SWE-Lego models to state-of-the-art performance among open-source models of comparable size on SWE-bench Verified: SWE-Lego-Qwen3-8B reaches 42.2%, and SWE-Lego-Qwen3-32B attains 52.6%. 3) We further evaluate and improve test-time scaling (TTS) built upon the SFT foundation. Based on a well-trained verifier, SWE-Lego models can be significantly boosted--for example, 42.2% to 49.6% and 52.6% to 58.8% under TTS@16 for the 8B and 32B models, respectively.

View Paper