SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories?

Xinyi He, Qian Liu, Mingzhe Du, Lin Yan, Zhijie Fan, Yiming Huang, Zejian Yuan, Zejun Ma

2025-07-17

SWE-Perf: Can Language Models Optimize Code Performance on Real-World
Repositories?

Summary

This paper talks about SWE-Perf, a new benchmark created to test how well large language models can improve the performance of real-world software code by making it run faster across an entire repository.

What's the problem?

The problem is that while AI models are good at fixing bugs or writing code, their ability to enhance the speed and efficiency of complex codebases has not been thoroughly explored or tested, especially when changes involve many files working together.

What's the solution?

The authors compiled 140 real examples of performance improvements from popular GitHub projects, including original code, expert optimizations, and tests to measure speed improvements. They then used these to test and compare how well different language models could optimize the code, both when given specific guidance and when working independently.

Why it matters?

This matters because optimizing code for speed is crucial in software development, and SWE-Perf helps reveal how much progress AI models have made toward this goal and what gaps remain. It provides a realistic and challenging way to measure and drive future improvements in AI-assisted code optimization.

Abstract

SWE-Perf is a benchmark for evaluating LLMs in code performance optimization using real-world repository data, revealing significant gaps compared to expert performance.

View Paper