MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining
Xiaomi LLM-Core Team, Bingquan Xia, Bowen Shen, Cici, Dawei Zhu, Di Zhang, Gang Wang, Hailin Zhang, Huaqiu Liu, Jiebao Xiao, Jinhao Dong, Liang Zhao, Peidian Li, Peng Wang, Shihua Yu, Shimao Chen, Weikun Wang, Wenhan Ma, Xiangwei Deng, Yi Huang, Yifan Song, Zihan Jiang
2025-05-13

Summary
This paper talks about MiMo-7B, a language model designed to be really good at reasoning, like solving math and programming problems, even though it's smaller than some other big models.
What's the problem?
The problem is that many large language models struggle with complex reasoning tasks, especially in areas like math and coding, and making them better usually means making them much bigger, which uses more resources.
What's the solution?
The researchers improved MiMo-7B by using special training tricks, like mixing different types of data and teaching the model to predict more than one word at a time during pre-training. After that, they used reinforcement learning focused on math and programming problems to make the model even smarter. As a result, MiMo-7B outperforms some larger models on these tough reasoning tasks.
Why it matters?
This matters because it shows that you don't always need the biggest model to get the best results. Smarter training can make smaller models powerful, which saves time, money, and energy while still solving complicated problems.
Abstract
MiMo-7B, a large language model optimized for reasoning tasks, enhances pre-training with data mixing and Multi-Token Prediction, and post-training with reinforcement learning on math and programming problems, achieving superior performance over larger models.