Learning to Reason for Factuality

Xilun Chen, Ilia Kulikov, Vincent-Pierre Berges, Barlas Oğuz, Rulin Shao, Gargi Ghosh, Jason Weston, Wen-tau Yih

2025-08-08

Summary

This paper talks about a new reward function designed to improve large language models so they generate answers that are more factual and detailed while still being helpful.

What's the problem?

The problem is that language models often produce information that sounds convincing but is not always true or detailed enough, which can lead to misinformation or confusion.

What's the solution?

The solution was to use a special reward system during training that encourages the model to produce more accurate and factually correct responses by evaluating and reinforcing factual details with online reinforcement learning.

Why it matters?

This matters because improving the factuality of language models helps users trust the information they get and makes AI more useful for tasks that require accurate knowledge and explanations.

Abstract

A novel reward function for online reinforcement learning improves factuality and detail in reasoning large language models without reducing helpfulness.

View Paper