Entropy-Based Adaptive Weighting for Self-Training

Xiaoxuan Wang, Yihe Deng, Mingyu Derek Ma, Wei Wang

2025-04-01

Entropy-Based Adaptive Weighting for Self-Training

Summary

This paper explores a new method to improve how AI models learn math by focusing on the problems they find most confusing.

What's the problem?

AI models often struggle to improve their math skills using self-generated examples.

What's the solution?

The researchers created a technique called EAST that gives more weight to the examples the AI is most uncertain about, helping it learn more effectively.

Why it matters?

This work matters because it can lead to AI models that are better at solving complex math problems.

Abstract

The mathematical problem-solving capabilities of large language models have become a focal point of research, with growing interests in leveraging self-generated reasoning paths as a promising way to refine and enhance these models. These paths capture step-by-step logical processes while requiring only the correct answer for supervision. The self-training method has been shown to be effective in reasoning tasks while eliminating the need for external models and manual annotations. However, optimizing the use of self-generated data for model training remains an open challenge. In this work, we propose Entropy-Based Adaptive Weighting for Self-Training (EAST), an adaptive weighting strategy designed to prioritize uncertain data during self-training. Specifically, EAST employs a mapping function with a tunable parameter that controls the sharpness of the weighting, assigning higher weights to data where the model exhibits greater uncertainty. This approach guides the model to focus on more informative and challenging examples, thereby enhancing its reasoning ability. We evaluate our approach on GSM8K and MATH benchmarks. Empirical results show that, while the vanilla method yields virtually no improvement (0%) on MATH, EAST achieves around a 1% gain over backbone model. On GSM8K, EAST attains a further 1-2% performance boost compared to the vanilla method.

View Paper