CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum
2025-07-30
Summary
This paper talks about CUDA-L1, an automated system that uses reinforcement learning to improve how CUDA programs run on different GPUs, making them faster without needing human experts.
What's the problem?
The problem is that optimizing CUDA code, which is used to make programs run efficiently on graphics cards, is very complicated because it depends on many factors like the GPU type and workload. Current AI models often can't generate well-optimized CUDA code on their own.
What's the solution?
CUDA-L1 solves this by using a special kind of reinforcement learning called contrastive reinforcement learning, where the system learns by comparing different versions of code and choosing the best optimizations. It discovers new optimization tricks, learns the core rules of CUDA performance, and avoids changes that actually slow things down.
Why it matters?
This matters because faster and better-optimized CUDA code means GPUs can work more efficiently, which is important for running advanced AI models and other demanding computer tasks. Automating this process saves time and effort and helps meet the growing need for GPU power.
Abstract
CUDA-L1, an automated reinforcement learning framework, significantly improves CUDA optimization across various GPU architectures, achieving substantial speedups without human expertise.