ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering

Yuki Imajuku, Kohki Horie, Yoichi Iwata, Kensho Aoki, Naohiro Takahashi, Takuya Akiba

2025-06-17

ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm
Engineering

Summary

This paper talks about ALE-Bench, a new benchmark that tests how well AI systems can solve very hard and long-term algorithm programming problems taken from real competitive contests called AtCoder Heuristic Contests. These problems are super challenging because they don’t have one perfect answer and need solutions to be improved step-by-step over weeks, like optimizing routes for deliveries or scheduling crews in factories.

What's the problem?

The problem is that most existing AI coding tests focus on short, simple problems where the answer is right or wrong, but real-world problems are much harder and take a long time to figure out because there’s no exact perfect solution. Current AI systems struggle to handle these long, complex tasks that need careful, ongoing improvement over time, which is important for things like routing and planning in industry.

What's the solution?

The solution is ALE-Bench, which uses a collection of tough problems from AtCoder contests that require long-term reasoning and continuous iteration to improve scores. The benchmark comes with tools for AI to test and refine its solutions gradually, similar to how human contestants work. It also includes an AI called ALE-Agent that uses smart methods like adding domain knowledge and searching many options in parallel to better solve these problems and compete with humans.

Why it matters?

This matters because real-world optimization tasks, like delivering packages efficiently or managing power grids, are complicated and need continuous improvement rather than quick perfect answers. ALE-Bench helps push AI to get better at these tough problems, moving beyond simple coding tests toward systems that can think creatively and improve over time in practical, important areas. This can lead to smarter AI tools that help industries work more efficiently and solve big challenges.

Abstract

ALE-Bench evaluates AI systems on score-based algorithmic programming contests drawn from AtCoder, focusing on long-term iterative problem-solving in domains like package-delivery routing, crew scheduling, factory production, and power-grid balancing.

View Paper