DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training

Zhenting Wang, Guofeng Cui, Kun Wan, Wentian Zhao

2025-04-15

DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM
Post-training

Summary

This paper talks about DUMP, a new system that helps train large language models using reinforcement learning in a smarter way. DUMP uses an automated approach to decide the order and mix of training data, so the AI learns faster and ends up performing better.

What's the problem?

The problem is that when training big language models with reinforcement learning, it's hard to know the best way to organize the training data. If the data isn't presented in a good order or mix, the AI can get stuck or learn slowly, making the whole process less efficient and effective.

What's the solution?

The researchers designed DUMP to automatically adjust the training schedule based on how well the model is learning from different types of data. It uses a method called UCB to decide which data to focus on at each step, making sure the AI gets the right challenges at the right times. This helps the model learn faster and reach higher performance levels.

Why it matters?

This work matters because it makes training advanced language models quicker and more effective, which means better AI tools can be developed in less time. By improving how AIs are trained, DUMP could lead to smarter, more capable systems that benefit everyone.

Abstract

A distribution-level curriculum learning framework using UCB for RL-based LLM post-training enhances convergence speed and performance by dynamically adjusting training schedules across diverse data distributions.

View Paper