InfiAlign: A Scalable and Sample-Efficient Framework for Aligning LLMs to Enhance Reasoning Capabilities
Shuo Cai, Su Lu, Qi Zhou, Kejing Yang, Zhijie Sang, Congkai Xie, Hongxia Yang
2025-08-08
Summary
This paper talks about InfiAlign, a new method designed to improve large language models' reasoning skills efficiently by using less data and computing power.
What's the problem?
The problem is that enhancing reasoning in large language models usually requires a lot of data and expensive computing, and existing methods often rely on complicated or task-specific tricks that make scaling difficult.
What's the solution?
The solution was to develop InfiAlign, which combines smart data selection from open sources with a training method that uses supervised fine-tuning and direct preference optimization. This approach focuses on high-quality and diverse training examples to boost reasoning abilities without needing large amounts of data.
Why it matters?
This matters because it provides a practical and scalable way to make AI models smarter at reasoning tasks while saving resources, which makes advanced AI more accessible and useful in real-world applications.
Abstract
InfiAlign, a scalable and sample-efficient post-training framework, combines supervised fine-tuning and Direct Preference Optimization to enhance large language models' reasoning abilities with minimal data and computational cost.