Improving Assembly Code Performance with Large Language Models via Reinforcement Learning

Anjiang Wei, Tarun Suresh, Huanmi Tan, Yinglun Xu, Gagandeep Singh, Ke Wang, Alex Aiken

2025-05-19

Improving Assembly Code Performance with Large Language Models via
Reinforcement Learning

Summary

This paper talks about using large language models and a special kind of training called reinforcement learning to make assembly code run faster and more efficiently than what regular computer programs can do.

What's the problem?

The problem is that assembly code, which is the low-level language computers actually use, is hard to optimize for speed and performance, and even the best compilers don’t always create the fastest possible code.

What's the solution?

The researchers trained large language models with reinforcement learning, specifically using a method called Proximal Policy Optimization, to teach the AI how to rewrite assembly code so it runs faster and passes more tests than code produced by standard compilers.

Why it matters?

This matters because it means computers and devices could run programs more quickly and use less energy, which is important for everything from smartphones to big servers, making technology more powerful and efficient.

Abstract

Reinforcement learning using Proximal Policy Optimization trains large language models to optimize assembly code performance, achieving higher speedups and test pass rates compared to standard compilers.

View Paper