CWM: An Open-Weights LLM for Research on Code Generation with World Models
FAIR CodeGen team, Quentin Carbonneaux, Gal Cohen, Jonas Gehring, Jacob Kahn, Jannik Kossen, Felix Kreuk, Emily McMilin, Michel Meyer, Yuxiang Wei, David Zhang, Kunhao Zheng, Jordi Armengol-Estapé, Pedram Bashiri, Maximilian Beck, Pierre Chambon, Abhishek Charnalia, Chris Cummins, Juliette Decugis, Zacharias V. Fisches, François Fleuret, Fabian Gloeckle
2025-10-07
Summary
This paper introduces Code World Model (CWM), a powerful new language model with 32 billion parameters designed to improve how computers generate code, specifically by giving it a better understanding of how code actually *works* in the real world.
What's the problem?
Current code-generating models are often trained just on existing code, which is like learning to build a car only by looking at pictures of cars. They don't understand what happens when the code is run, or how it interacts with its environment. This limits their ability to create complex, reliable software, especially when reasoning and planning are needed.
What's the solution?
The researchers 'mid-trained' CWM, meaning they continued training it *while* letting it interact with simulated environments. It observed the results of running Python code and even controlled programs in a Docker environment, learning from the 'cause and effect' of its actions. They also used a technique called reinforcement learning to reward CWM for successfully solving coding challenges, math problems, and more complex software engineering tasks. The model can handle very long sequences of code – up to 131,000 pieces of information at once.
Why it matters?
CWM provides a new platform for researchers to explore how giving AI a 'world model' – an understanding of how things work – can dramatically improve code generation. It’s already showing promising results on standard coding benchmarks, achieving high scores on tests like SWE-bench Verified and Math-500, and the researchers are sharing the model’s development stages so others can build upon their work and create even more capable AI coding assistants.
Abstract
We release Code World Model (CWM), a 32-billion-parameter open-weights LLM, to advance research on code generation with world models. To improve code understanding beyond what can be learned from training on static code alone, we mid-train CWM on a large amount of observation-action trajectories from Python interpreter and agentic Docker environments, and perform extensive multi-task reasoning RL in verifiable coding, math, and multi-turn software engineering environments. With CWM, we provide a strong testbed for researchers to explore the opportunities world modeling affords for improving code generation with reasoning and planning in computational environments. We present first steps of how world models can benefit agentic coding, enable step-by-step simulation of Python code execution, and show early results of how reasoning can benefit from the latter. CWM is a dense, decoder-only LLM trained with a context size of up to 131k tokens. Independent of its world modeling capabilities, CWM offers strong performance on general coding and math tasks: it reaches pass@1 scores of 65.8% on SWE-bench Verified (with test-time scaling), 68.6% on LiveCodeBench, 96.6% on Math-500, and 76.0% on AIME 2024. To support further research on code world modeling, we release model checkpoints after mid-training, SFT, and RL.