Future Is Unevenly Distributed: Forecasting Ability of LLMs Depends on What We're Asking

Chinmay Karkar, Paras Chopra

2025-11-26

Future Is Unevenly Distributed: Forecasting Ability of LLMs Depends on What We're Asking

Summary

This paper explores whether large language models, like the ones powering chatbots, can actually predict future events in areas like politics, society, and the economy.

What's the problem?

While these language models *seem* to have some ability to forecast, their success isn't consistent. Sometimes they're surprisingly accurate, and other times they're way off. The researchers wanted to figure out why this happens – what makes them good or bad at predicting things, and if the way a question is asked changes the answer.

What's the solution?

The researchers tested different language models with real-world questions about events that happened *after* the models were initially trained. They experimented with giving the models different amounts of background information, asking questions in different ways, and providing them with current news articles to see if that improved their predictions. They then analyzed how accurate the models were and whether their confidence in their predictions matched reality.

Why it matters?

This research is important because if we can understand how and why these models succeed or fail at forecasting, we can improve their reliability. This could potentially lead to better tools for understanding future trends and making informed decisions in areas like politics, economics, and social planning. It highlights that simply *asking* a model a question isn't enough; how you ask it and what information you provide significantly impacts the result.

Abstract

Large Language Models (LLMs) demonstrate partial forecasting competence across social, political, and economic events. Yet, their predictive ability varies sharply with domain structure and prompt framing. We investigate how forecasting performance varies with different model families on real-world questions about events that happened beyond the model cutoff date. We analyze how context, question type, and external knowledge affect accuracy and calibration, and how adding factual news context modifies belief formation and failure modes. Our results show that forecasting ability is highly variable as it depends on what, and how, we ask.

View Paper