When AI Navigates the Fog of War

Ming Li, Xirui Li, Tianyi Zhou

2026-03-19

Summary

This paper investigates whether artificial intelligence can make reasonable predictions about a conflict *while* it's happening, not just after we know the outcome, focusing on a hypothetical war in the Middle East set in 2026.

What's the problem?

It's really hard to test if AI can truly 'reason' about current events because AI models are often trained on past data. This means they might just be repeating information they've already 'seen' instead of actually predicting the future. Essentially, it's difficult to tell if the AI is smart or just remembering history when looking at geopolitical events.

What's the solution?

The researchers focused on a future conflict – the 2026 Middle East conflict – that happened *after* the AI models were trained. They broke down the early stages of this conflict into specific moments in time and asked the AI questions that could only be answered using information available at each of those moments. This way, they minimized the chance the AI was just recalling past events and could better assess its reasoning abilities. They created a series of questions to test the AI's understanding of the situation as it unfolded.

Why it matters?

This research is important because it provides a first look at how AI thinks about a real-world, ongoing conflict without the benefit of hindsight. It shows that AI can sometimes understand the underlying strategic reasons behind actions, but it struggles with complex political situations. It also demonstrates that AI's understanding of a conflict can change over time, just like a human's understanding would. This work creates a record of AI reasoning during a crisis, allowing for future comparison and improvement.

Abstract

Can AI reason about a war before its trajectory becomes historically obvious? Analyzing this capability is difficult because retrospective geopolitical prediction is heavily confounded by training-data leakage. We address this challenge through a temporally grounded case study of the early stages of the 2026 Middle East conflict, which unfolded after the training cutoff of current frontier models. We construct 11 critical temporal nodes, 42 node-specific verifiable questions, and 5 general exploratory questions, requiring models to reason only from information that would have been publicly available at each moment. This design substantially mitigates training-data leakage concerns, creating a setting well-suited for studying how models analyze an unfolding crisis under the fog of war, and provides, to our knowledge, the first temporally grounded analysis of LLM reasoning in an ongoing geopolitical conflict. Our analysis reveals three main findings. First, current state-of-the-art large language models often display a striking degree of strategic realism, reasoning beyond surface rhetoric toward deeper structural incentives. Second, this capability is uneven across domains: models are more reliable in economically and logistically structured settings than in politically ambiguous multi-actor environments. Finally, model narratives evolve over time, shifting from early expectations of rapid containment toward more systemic accounts of regional entrenchment and attritional de-escalation. Since the conflict remains ongoing at the time of writing, this work can serve as an archival snapshot of model reasoning during an unfolding geopolitical crisis, enabling future studies without the hindsight bias of retrospective analysis.

View Paper