CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum

2025-07-30

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement
Learning

Summary

This paper talks about CUDA-L1, an automated system that uses reinforcement learning to improve how CUDA programs run on different GPUs, making them faster without needing human experts.

What's the problem?

The problem is that optimizing CUDA code, which is used to make programs run efficiently on graphics cards, is very complicated because it depends on many factors like the GPU type and workload. Current AI models often can't generate well-optimized CUDA code on their own.

What's the solution?

CUDA-L1 solves this by using a special kind of reinforcement learning called contrastive reinforcement learning, where the system learns by comparing different versions of code and choosing the best optimizations. It discovers new optimization tricks, learns the core rules of CUDA performance, and avoids changes that actually slow things down.

Why it matters?

This matters because faster and better-optimized CUDA code means GPUs can work more efficiently, which is important for running advanced AI models and other demanding computer tasks. Automating this process saves time and effort and helps meet the growing need for GPU power.

Abstract

CUDA-L1, an automated reinforcement learning framework, significantly improves CUDA optimization across various GPU architectures, achieving substantial speedups without human expertise.

View Paper