A survey of the Learning from Rewards paradigm in Large Language Models, encompassing reinforcement learning techniques and reward-guided decoding strategies, enabling dynamic feedback and aligned preferences.

This paper talks about how large language models can get better at giving answers people like by learning from rewards, which means they get feedback and adjust their responses based on what works best.

Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models

Summary

What's the problem?

What's the solution?

Why it matters?

Abstract