Learning Evidence Highlighting for Frozen LLMs
Shaoang Li, Yanhang Shi, Yufei Li, Mingfu Liang, Xiaohan Wei, Yunchen Pu, Fei Tian, Chonglin Sun, Frank Shyu, Luke Simon, Sandeep Pandey, Xi Liu, Jian Li
2026-04-27
Summary
This paper introduces a new method called HiLight that helps large language models, which are good at reasoning, better identify important information within long and complex texts.
What's the problem?
Large language models sometimes struggle to find the crucial pieces of evidence needed to solve a problem when that evidence is hidden within a lot of irrelevant information. They can get lost in the details and miss what’s truly important, even though they're generally capable of good reasoning.
What's the solution?
HiLight works by adding simple 'highlight' tags around the most important parts of the text *without* changing the text itself. It uses a separate, smaller program called an 'Emphasis Actor' that learns which parts of the text to highlight based on whether highlighting those parts helps the main language model (the 'Solver') get the right answer. This learning process doesn't require anyone to manually label what's important; it just uses the Solver's success or failure as feedback. The original language model remains unchanged, and only processes the highlighted text.
Why it matters?
This is important because it improves the performance of language models on tasks like recommending things or answering questions based on long documents. It also shows that the highlighting strategy isn't specific to one type of language model – it can work with different models, even those accessed through an online service, suggesting it’s learning a general skill for identifying key evidence.
Abstract
Large Language Models (LLMs) can reason well, yet often miss decisive evidence when it is buried in long, noisy contexts. We introduce HiLight, an Evidence Emphasis framework that decouples evidence selection from reasoning for frozen LLM solvers. HiLight avoids compressing or rewriting the input, which can discard or distort evidence, by training a lightweight Emphasis Actor to insert minimal highlight tags around pivotal spans in the unaltered context. A frozen Solver then performs downstream reasoning on the emphasized input. We cast highlighting as a weakly supervised decision-making problem and optimize the Actor with reinforcement learning using only the Solver's task reward, requiring no evidence labels and no access to or modification of the Solver. Across sequential recommendation and long-context question answering, HiLight consistently improves performance over strong prompt-based and automated prompt-optimization baselines. The learned emphasis policy transfers zero-shot to both smaller and larger unseen Solver families, including an API-based Solver, suggesting that the Actor captures genuine, reusable evidence structure rather than overfitting to a single backbone.