Aquila2 Technical Report

Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

2024-08-15

Summary

This paper introduces the Aquila2 series, a set of bilingual language models designed to improve training efficiency and performance in processing languages like English and Chinese.

What's the problem?

Training large language models can be complex and time-consuming, especially when trying to ensure they perform well in multiple languages. Existing models often struggle with monitoring their training progress and optimizing data effectively, which can lead to inefficiencies and lower performance.

What's the solution?

The authors present Aquila2, which uses a new framework called HeuriMentor (HM) that includes tools for real-time monitoring and data management during training. This framework helps track how well the model is learning and allows for better organization of the training data. The Aquila2 series includes models with different sizes (7, 34, and 70 billion parameters) and has been tested on various benchmarks, showing strong performance in both English and Chinese.

Why it matters?

This research is important because it advances the development of bilingual language models that can efficiently learn from data while maintaining high performance. By making their training code and model weights publicly available, the authors support further research and application development in the field of natural language processing.

Abstract

This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training State Monitor (TSM), and Data Management Unit (DMU), allows for precise monitoring of the model's training progress and enables efficient optimization of data distribution, thereby enhancing training effectiveness. Extensive evaluations show that the Aquila2 model series performs comparably well on both English and Chinese benchmarks. Specifically, Aquila2-34B demonstrates only a slight decrease in performance when quantized to Int4. Furthermore, we have made our training code (https://github.com/FlagOpen/FlagScale) and model weights (https://github.com/FlagAI-Open/Aquila2) publicly available to support ongoing research and the development of applications.

View Paper