InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
Congkai Xie, Shuo Cai, Wenjun Wang, Pengxiang Li, Zhijie Sang, Kejing Yang, Yiming Zhang, Zhen Li, Guanghao Zhu, Zeyu Liu, Yang Yu, Yuhang Liu, Su Lu, Baoyi He, Qi Zhou, Xiaotian Han, Jianbo Yuan, Shengyu Zhang, Fei Wu, Hongxia Yang
2025-02-20
Summary
This paper talks about InfiR, a new way to create small AI models that can reason almost as well as much larger models, while being more efficient and privacy-friendly. It's like teaching a compact smart assistant to think as critically as a supercomputer, but using much less power and keeping your information more secure.
What's the problem?
Big AI models are really good at reasoning and understanding complex tasks, but they need a lot of computing power and can raise privacy concerns because they process so much data. It's like having a super-smart friend who's great at solving problems, but they need a whole room of computers to think and might accidentally share your secrets.
What's the solution?
The researchers created a special training method for small AI models, including ones that can understand both text and images. This method teaches these compact models to reason effectively, making them almost as smart as the big models but much more efficient. It's like giving a small, energy-efficient computer a crash course in critical thinking, so it can perform almost as well as a supercomputer on complex tasks.
Why it matters?
This matters because it could make advanced AI more accessible and safer to use. Smaller, smarter AI models could be used in phones or other small devices, helping people solve complex problems without needing to connect to big, power-hungry computers. It also means your personal information is less likely to be shared or misused, as these models can work more privately. This could lead to smarter, more helpful AI assistants that respect your privacy and don't need constant internet connection to work well.
Abstract
Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) have made significant advancements in reasoning capabilities. However, they still face challenges such as high computational demands and privacy concerns. This paper focuses on developing efficient Small Language Models (SLMs) and Multimodal Small Language Models (MSLMs) that retain competitive reasoning abilities. We introduce a novel training pipeline that enhances reasoning capabilities and facilitates deployment on edge devices, achieving state-of-the-art performance while minimizing development costs. \InfR~ aims to advance AI systems by improving reasoning, reducing adoption barriers, and addressing privacy concerns through smaller model sizes. Resources are available at https://github. com/Reallm-Labs/InfiR.