GEB-1.3B: Open Lightweight Large Language Model
Jie Wu, Yufeng Zhu, Lei Shen, Xuqing Lu
2024-06-17

Summary
This paper introduces GEB-1.3B, a new lightweight large language model (LLM) designed to work efficiently on regular computers. It has been trained on a massive amount of text in both Chinese and English, making it capable of performing various language tasks effectively.
What's the problem?
Many existing large language models, like ChatGPT and Claude, are very powerful but require a lot of computer resources to run. This means they can only be used on high-performance servers, which are expensive and not accessible to everyone. Additionally, these models can be slow to respond because they need to do a lot of calculations, making them less practical for everyday use.
What's the solution?
To address these challenges, the authors developed GEB-1.3B, which is optimized to run efficiently on standard CPUs instead of needing powerful servers. They used advanced training techniques like ROPE (which helps with understanding word positions), Group-Query-Attention (which improves how the model focuses on different parts of the input), and FlashAttention-2 (which speeds up processing). They also fine-tuned the model with 10 million examples to improve its performance on specific tasks. As a result, GEB-1.3B performs well on various benchmarks and is faster than some other similar models.
Why it matters?
This research is important because it makes powerful language models more accessible to a wider audience by allowing them to run on regular computers. By releasing GEB-1.3B as an open-source model, the authors hope to encourage further research and development in the field of natural language processing, leading to new applications and innovations that can benefit everyone.
Abstract
Recently developed large language models (LLMs) such as ChatGPT, Claude, and Llama have demonstrated impressive abilities, and even surpass human-level performance in several tasks. Despite their success, the resource-intensive demands of these models, requiring significant computational power for both training and inference, limit their deployment to high-performance servers. Additionally, the extensive calculation requirements of the models often lead to increased latency in response times. With the increasing need for LLMs to operate efficiently on CPUs, research about lightweight models that are optimized for CPU inference has emerged. In this work, we introduce GEB-1.3B, a lightweight LLM trained on 550 billion tokens in both Chinese and English languages. We employ novel training techniques, including ROPE, Group-Query-Attention, and FlashAttention-2, to accelerate training while maintaining model performance. Additionally, we fine-tune the model using 10 million samples of instruction data to enhance alignment. GEB-1.3B exhibits outstanding performance on general benchmarks such as MMLU, C-Eval, and CMMLU, outperforming comparative models such as MindLLM-1.3B and TinyLLaMA-1.1B. Notably, the FP32 version of GEB-1.3B achieves commendable inference times on CPUs, with ongoing efforts to further enhance speed through advanced quantization techniques. The release of GEB-1.3B as an open-source model marks a significant contribution to the development of lightweight LLMs, promising to foster further research and innovation in the field.