CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Zhengyang Tang, Zihan Ye, Chenyu Huang, Xuhan Huang, Chengpeng Li, Sihang Li, Guanhua Chen, Ming Yan, Zizhuo Wang, Hongyuan Zha, Dayiheng Liu, Benyou Wang

2025-10-09

CALM Before the STORM: Unlocking Native Reasoning for Optimization Modeling

Summary

This paper explores how to best use powerful AI models, called Large Reasoning Models, to automatically build and solve complex optimization problems – think of things like figuring out the most efficient way to schedule deliveries or manage resources.

What's the problem?

While these large AI models are good at reasoning, simply training them on existing data doesn't really unlock their full potential for optimization. Traditional methods of adapting these models don't take advantage of their advanced reasoning abilities, and directly fine-tuning them on standard datasets doesn't lead to significant improvements.

What's the solution?

The researchers developed a new method called CALM, which stands for Corrective Adaptation with Lightweight Modification. It works by having a human 'intervener' identify mistakes the AI makes while solving a problem and then provide short, helpful hints. The AI learns from these hints, improving its reasoning step-by-step. This process creates high-quality training data, and the model is further refined using reinforcement learning. They then built a specific model, STORM, using this method.

Why it matters?

This research is important because it shows a way to make these large AI models much better at solving real-world optimization problems. STORM, a relatively small 4 billion parameter model, performed as well as a much larger 671 billion parameter model, demonstrating that carefully guiding the AI’s reasoning process is more effective than just scaling up the model size. This offers a more practical and efficient path to achieving expert-level performance in these tasks.

Abstract

Large Reasoning Models (LRMs) have demonstrated strong capabilities in complex multi-step reasoning, opening new opportunities for automating optimization modeling. However, existing domain adaptation methods, originally designed for earlier instruction-tuned models, often fail to exploit the advanced reasoning patterns of modern LRMs -- In particular, we show that direct fine-tuning on traditional non-reflective datasets leads to limited gains. To fully leverage LRMs' inherent reasoning abilities, we propose CALM (Corrective Adaptation with Lightweight Modification), a framework that progressively refines LRMs within their native reasoning modes for optimization modeling tasks. In CALM, an expert intervener identifies reasoning flaws and provides concise corrective hints, which the LRM incorporates to produce improved reasoning trajectories. These interventions modify fewer than 2.6\% of generated tokens, but generate high-quality data for soft adaptation through supervised fine-tuning. The adapted model is then further improved through reinforcement learning. Building on CALM, we develop STORM (Smart Thinking Optimization Reasoning Model), a 4B-parameter LRM that achieves a new state-of-the-art average accuracy of 68.9\% across five popular optimization modeling benchmarks, matching the performance of a 671B LRM. These results demonstrate that dynamic, hint-based data synthesis both preserves and amplifies the native reasoning patterns of modern LRMs, offering a more effective and scalable path towards expert-level performance on challenging optimization modeling tasks.

View Paper