Scaling Open-Ended Reasoning to Predict the Future

Nikhil Chandak, Shashwat Goel, Ameya Prabhu, Moritz Hardt, Jonas Geiping

2026-01-01

Scaling Open-Ended Reasoning to Predict the Future

Summary

This research focuses on getting language models to predict future events, essentially making educated guesses about what will happen based on current information.

What's the problem?

Predicting the future is hard because it's uncertain! Existing methods often struggle to make accurate and reliable forecasts, and training these models requires a lot of data about things that *haven't* happened yet, which is obviously a challenge. Also, it's easy for models to 'cheat' by accidentally learning from information that wouldn't have been available at the time of the prediction.

What's the solution?

The researchers created a large dataset called OpenForesight by automatically generating forecasting questions based on news events. They used a model called Qwen3 and a technique called reinforcement learning to train it to answer these questions. To avoid 'cheating', they only used news articles from *before* the event they were trying to predict. They also improved how the model learns from its mistakes and used a system to retrieve relevant information from past news to help with predictions. Finally, they tested their model, OpenForecaster 8B, by having it predict events that happened between May and August 2025.

Why it matters?

This work is important because it shows that language models can be surprisingly good at forecasting future events, even matching the performance of much larger, privately owned models. By making their data, code, and models publicly available, the researchers are helping to advance research in this area and make it easier for others to build better forecasting systems. The improvements in how well the model is 'calibrated' – meaning its confidence matches its accuracy – are also valuable and can be applied to other language model tasks.

Abstract

High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up training data, we synthesize novel forecasting questions from global events reported in daily news, using a fully automated, careful curation recipe. We train the Qwen3 thinking models on our dataset, OpenForesight. To prevent leakage of future information during training and evaluation, we use an offline news corpus, both for data generation and retrieval in our forecasting system. Guided by a small validation set, we show the benefits of retrieval, and an improved reward function for reinforcement learning (RL). Once we obtain our final forecasting system, we perform held-out testing between May to August 2025. Our specialized model, OpenForecaster 8B, matches much larger proprietary models, with our training improving the accuracy, calibration, and consistency of predictions. We find calibration improvements from forecasting training generalize across popular benchmarks. We open-source all our models, code, and data to make research on language model forecasting broadly accessible.

View Paper