Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Harshit Joshi, Priyank Shethia, Jadelynn Dao, Monica S. Lam

2026-04-27

Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets

Summary

This paper introduces SLIDERS, a new system designed to answer questions based on large collections of documents, even when those documents are too long for current AI models to process all at once.

What's the problem?

Question answering with lots of documents is hard because AI models have limits on how much text they can look at in one go. Simply chopping documents into smaller pieces and then trying to combine the answers from each piece doesn't work well because it becomes difficult to manage and make sense of all the individual pieces of information as the number of pieces grows.

What's the solution?

SLIDERS solves this by first pulling out the important information from each document and organizing it into a structured database, similar to a spreadsheet. This allows the system to use a database query language, SQL, to find answers instead of trying to process huge amounts of text. It also includes a step to check for and fix any errors or inconsistencies in the information it extracts, making sure everything fits together logically.

Why it matters?

SLIDERS is a significant improvement over existing methods for handling long documents. It performs better than current AI models, even those that are very powerful like GPT-4, and it scales well to extremely large document collections, meaning it can handle much more information effectively. This is important because it allows us to get accurate answers from complex information sources that were previously too difficult to analyze.

Abstract

Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common workaround is to decompose documents into chunks and assemble answers from chunk-level outputs, but this introduces an aggregation bottleneck: as the number of chunks grows, systems must still combine and reason over an increasingly large body of extracted evidence. We present SLIDERS, a framework for question answering over long document collections through structured reasoning. SLIDERS extracts salient information into a relational database, enabling scalable reasoning over persistent structured state via SQL rather than concatenated text. To make this locally extracted representation globally coherent, SLIDERS introduces a data reconciliation stage that leverages provenance, extraction rationales, and metadata to detect and repair duplicated, inconsistent, and incomplete records. SLIDERS outperforms all baselines on three existing long-context benchmarks, despite all of them fitting within the context window of strong base LLMs, exceeding GPT-4.1 by 6.6 points on average. It also improves over the next best baseline by ~19 and ~32 points on two new benchmarks at 3.9M and 36M tokens, respectively.

View Paper