LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Toby Simonds, Akira Yoshiyama

2025-03-05

LADDER: Self-Improving LLMs Through Recursive Problem Decomposition

Summary

This paper talks about LADDER, a new way to help AI language models get better at solving problems on their own, especially in math, by breaking down hard problems into easier ones

What's the problem?

Usually, AI models need humans to give them carefully chosen problems or feedback to improve. This takes a lot of time and effort, and it's hard to keep up with how fast AI is developing

What's the solution?

The researchers created LADDER, which lets AI models create their own easier versions of tough problems. The AI then solves these easier problems, learning step by step how to tackle harder ones. They also made TTRL, which helps the AI get even better during test time by quickly learning from similar problems

Why it matters?

This matters because it shows AI can get much smarter without needing bigger computers or constant human help. In math tests, an AI using LADDER got way better at solving hard integration problems, even beating some of the best AIs out there. This could lead to AI that can learn and improve on its own in many different areas, not just math

Abstract

We introduce LADDER (Learning through Autonomous Difficulty-Driven Example Recursion), a framework which enables Large Language Models to autonomously improve their problem-solving capabilities through self-guided learning by recursively generating and solving progressively simpler variants of complex problems. Unlike prior approaches that require curated datasets or human feedback, LADDER leverages a model's own capabilities to generate easier question variants. We demonstrate LADDER's effectiveness in the subject of mathematical integration, improving Llama 3.2 3B's accuracy from 1% to 82% on undergraduate-level problems and enabling Qwen2.5 7B Deepseek-R1 Distilled to achieve 73% on the MIT Integration Bee qualifying examination. We also introduce TTRL (Test-Time Reinforcement Learning), where we perform reinforcement learning on variants of test problems at inference time. TTRL enables Qwen2.5 7B Deepseek-R1 Distilled to achieve a state-of-the-art score of 90% on the MIT Integration Bee qualifying examination, surpassing OpenAI o1's performance. These results show how self-directed strategic learning can achieve significant capability improvements without relying on architectural scaling or human supervision.

View Paper