AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

Minwei Kong, Ao Qu, Xiaotong Guo, Wenbin Ouyang, Chonghe Jiang, Han Zheng, Yining Ma, Dingyi Zhuang, Yuhan Tang, Junyi Li, Hai Wang, Cathy Wu, Jinhua Zhao

2025-10-23

AlphaOPT: Formulating Optimization Programs with Self-Improving LLM Experience Library

Summary

This paper introduces AlphaOPT, a new system that helps Large Language Models (LLMs) get better at solving optimization problems – basically, finding the best solution to a problem with certain constraints. It does this without needing a ton of specific examples or constant retraining of the LLM itself.

What's the problem?

Currently, getting computers to solve real-world problems that require optimization is hard because you need to translate everyday language into precise mathematical formulas a computer can understand. Existing attempts to use LLMs for this either rely on very specific instructions that break easily, or require expensive and time-consuming retraining of the LLM for each new type of problem, and they don't generalize well to new situations.

What's the solution?

AlphaOPT works by creating a sort of 'experience library' where the LLM learns from its mistakes and successes. When it tries to solve a problem and fails, it analyzes *why* it failed, using the computer's solver to verify the reasoning. It then stores this information – the type of problem, the conditions that led to failure, an explanation, and an example – in the library. Over time, AlphaOPT gets better at recognizing patterns and applying the right solutions to new problems by continually updating this library, rather than changing the LLM's core programming. It's a two-step process: learning from failures and then improving how it finds relevant information in the library.

Why it matters?

This is important because it makes optimization modeling more accessible and efficient. AlphaOPT allows LLMs to learn with fewer examples, avoids the need for constant and costly retraining, and makes the LLM’s reasoning process more transparent so humans can understand and even correct it. This means we can potentially automate more complex decision-making processes across many different fields.

Abstract

Optimization modeling enables critical decisions across industries but remains difficult to automate: informal language must be mapped to precise mathematical formulations and executable solver code. Prior LLM approaches either rely on brittle prompting or costly retraining with limited generalization. We present AlphaOPT, a self-improving experience library that enables an LLM to learn from limited demonstrations (even answers alone, without gold-standard programs) and solver feedback - without annotated reasoning traces or parameter updates. AlphaOPT operates in a continual two-phase cycle: (i) a Library Learning phase that reflects on failed attempts, extracting solver-verified, structured insights as {taxonomy, condition, explanation, example}; and (ii) a Library Evolution phase that diagnoses retrieval misalignments and refines the applicability conditions of stored insights, improving transfer across tasks. This design (1) learns efficiently from limited demonstrations without curated rationales, (2) expands continually without costly retraining by updating the library rather than model weights, and (3) makes knowledge explicit and interpretable for human inspection and intervention. Experiments show that AlphaOPT steadily improves with more data (65% to 72% from 100 to 300 training items) and surpasses the strongest baseline by 7.7% on the out-of-distribution OptiBench dataset when trained only on answers. Code and data are available at: https://github.com/Minw913/AlphaOPT.

View Paper