< Explain other AI papers

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie

2025-06-16

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive
  Programming?

Summary

This paper talks about LiveCodeBench Pro, a new testing system that evaluates how well large language models (LLMs) perform on competitive programming problems, especially those used in big contests like Codeforces, ICPC, and IOI. Expert programmers who have won medals help tag and analyze the problems and the LLMs' answers to see where the models do well and where they fail.

What's the problem?

The problem is that even though some reports say LLMs can beat top human programmers, these models still have trouble with complex parts of programming contests that require deep understanding of algorithms and careful reasoning. They do better on problems that require writing code accurately but struggle on problems needing advanced problem-solving and thinking about special cases. This means they often confidently give wrong answers when they don’t really understand the problem fully.

What's the solution?

The solution was to create LiveCodeBench Pro, a carefully controlled benchmark without data leaks, filled with a variety of problems from real contests and classified by top medalists. They analyzed how well different LLMs did on these problems and studied their mistakes in detail. This approach highlights exactly what kinds of problems challenge LLMs and shows that current models still lag behind the best human programmers, especially on the hardest problems.

Why it matters?

This matters because it gives a clear and fair way to measure how good AI is at competitive programming compared to humans. By understanding where LLMs struggle, researchers can focus on making improvements where the models need it most. This helps move AI forward in fields that require strong reasoning and precise coding skills, which are important for automating complex software development tasks in the future.

Abstract

LLMs perform well on implementation-heavy competitive programming problems but struggle with nuanced algorithmic reasoning, as highlighted by LiveCodeBench Pro.