Learning Explainable Dense Reward Shapes via Bayesian Optimization

Ryan Koo, Ian Yang, Vipul Raheja, Mingyi Hong, Kwang-Sung Jun, Dongyeop Kang

2025-04-30

Learning Explainable Dense Reward Shapes via Bayesian Optimization

Summary

This paper talks about a new way to help AI learn faster and better by giving it clearer and more detailed feedback on what it's doing right or wrong.

What's the problem?

When AI learns from human feedback, it can be hard for the system to figure out exactly which parts of its actions are good or bad, which slows down its learning and makes it less effective.

What's the solution?

The researchers created a special reward system using explainability techniques and Bayesian optimization, so the AI gets more specific and understandable feedback for each step it takes, helping it learn what to do much more quickly.

Why it matters?

This matters because it can make AI training much faster and the results more reliable, which is important for everything from teaching robots new skills to improving virtual assistants and game AIs.

Abstract

Proposing a reward-shaping function using explainability methods to improve token-level credit assignment in reinforcement learning from human feedback for better performance and faster policy learning.

View Paper