Online Experiential Learning for Language Models
Tianzhu Ye, Li Dong, Qingxiu Dong, Xun Wu, Shaohan Huang, Furu Wei
2026-03-18
Summary
This paper introduces a new way to improve large language models, called Online Experiential Learning, by letting them learn directly from their interactions with users in the real world, rather than just relying on pre-collected data.
What's the problem?
Currently, large language models are typically improved by training them on datasets created by humans or simulated environments. This means they don't get to learn from the actual experiences they have while being used by people, which is a huge missed opportunity for improvement. They're essentially stuck with what they learned *before* being put into action, and can't adapt to the nuances of real-world conversations and tasks.
What's the solution?
The researchers developed a system where the language model continuously learns from its own interactions. It works in two steps: first, the model identifies valuable patterns and knowledge from its conversations with users. Second, it uses this knowledge to update its internal settings, making it better at responding in the future. This process repeats in a loop, so the model constantly gets smarter as it interacts with more users. Importantly, it doesn't need access to the user's data directly, just the patterns it observes.
Why it matters?
This is important because it allows language models to become more effective and efficient over time without constant human intervention. By learning from real-world use, the models can adapt to different situations, improve their accuracy, and use fewer resources, ultimately leading to better performance and a more natural user experience. It also suggests a path towards creating AI that can truly learn and improve on its own.
Abstract
The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based game environments across multiple model scales and both thinking and non-thinking variants. OEL achieves consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. Our analysis further shows that extracted experiential knowledge is significantly more effective than raw trajectories, and that on-policy consistency between the knowledge source and the policy model is critical for effective learning.