BOW: Bottlenecked Next Word Exploration

Ming Shen, Zhikun Xu, Xiao Ye, Jacob Dineen, Ben Zhou

2025-06-17

Summary

This paper talks about BOW, a new way to help language models think more carefully when guessing the next word in a sentence. Instead of just picking the next word quickly, BOW makes the model create a step-by-step reasoning process first. Then, another part of the system judges that reasoning to pick the best next word, which helps the model learn to think deeper and make better predictions.

What's the problem?

The problem is that many language models guess the next word by following simple patterns without really understanding why a word fits next. This can cause them to make mistakes, especially when the sentence or idea is more complicated and requires real reasoning.

What's the solution?

The solution was to add a reasoning step between the model that suggests next words and the part that chooses the best word. The model has to create a reasoning path without knowing the correct next word, and then the judge model checks how good that reasoning is at predicting the word. The system learns to improve this reasoning over time through a process called reinforcement learning, which rewards better thinking strategies.

Why it matters?

This matters because it helps language models become smarter and understand language more like humans do by really thinking through why a word should come next. This leads to more accurate predictions and better performance on tasks that need strong reasoning, making AI language tools more reliable and easier to trust.

Abstract

BOW, a reinforcement learning framework, enhances language model reasoning by introducing a reasoning bottleneck between the policy and judge models, improving general and next-word reasoning.

View Paper