Phi-4 Technical Report
Marah Abdin, Jyoti Aneja, Harkirat Behl, Sébastien Bubeck, Ronen Eldan, Suriya Gunasekar, Michael Harrison, Russell J. Hewett, Mojan Javaheripi, Piero Kauffmann, James R. Lee, Yin Tat Lee, Yuanzhi Li, Weishung Liu, Caio C. T. Mendes, Anh Nguyen, Eric Price, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Xin Wang
2024-12-13

Summary
This paper discusses phi-4, a new language model with 14 billion parameters that focuses on improving data quality for better performance in tasks like math and reasoning.
What's the problem?
Large language models often require a lot of data to train effectively, but many existing models rely on organic data sources, which can be noisy or irrelevant. This can lead to poor performance, especially in specialized tasks like STEM (science, technology, engineering, and mathematics) questions. Additionally, many models do not utilize synthetic data effectively, which limits their training potential.
What's the solution?
The authors developed phi-4 by incorporating synthetic data throughout the training process. They used a method called singular value decomposition (SVD) to focus on the most relevant pieces of knowledge for each task. By combining high-quality synthetic datasets with curated organic data and innovative training techniques, phi-4 significantly improved its reasoning abilities compared to its predecessor, GPT-4. The model also maintains a similar architecture to phi-3 but achieves better results due to these enhancements.
Why it matters?
This research is important because it shows how focusing on data quality and innovative training methods can lead to better-performing language models, even with fewer parameters. Phi-4 outperforms larger models in specific tasks, demonstrating that smaller models can be just as effective as bigger ones when trained properly. This could lead to more efficient AI applications in various fields that require advanced reasoning and problem-solving skills.
Abstract
We present phi-4, a 14-billion parameter language model developed with a training recipe that is centrally focused on data quality. Unlike most language models, where pre-training is based primarily on organic data sources such as web content or code, phi-4 strategically incorporates synthetic data throughout the training process. While previous models in the Phi family largely distill the capabilities of a teacher model (specifically GPT-4), phi-4 substantially surpasses its teacher model on STEM-focused QA capabilities, giving evidence that our data-generation and post-training techniques go beyond distillation. Despite minimal changes to the phi-3 architecture, phi-4 achieves strong performance relative to its size -- especially on reasoning-focused benchmarks -- due to improved data, training curriculum, and innovations in the post-training scheme.