H2O-Danube3 Technical Report

Pascal Pfeiffer, Philipp Singer, Yauhen Babakhin, Gabor Fodor, Nischay Dhankhar, Sri Satish Ambati

2024-07-15

Summary

This paper introduces H2O-Danube3, a series of small language models designed for efficient processing on devices like smartphones, trained on large datasets to perform well in various tasks.

What's the problem?

Many existing language models are too large and require powerful hardware to run, making them impractical for everyday use on mobile devices. Additionally, these models often need extensive computational resources to be fine-tuned for specific tasks, which can be expensive and time-consuming.

What's the solution?

H2O-Danube3 consists of two models: H2O-Danube3-4B, trained on 6 trillion tokens, and H2O-Danube3-500M, trained on 4 trillion tokens. These models are pre-trained using a mix of high-quality web data in three stages and then fine-tuned for chat purposes. The compact design allows them to run efficiently on modern smartphones without needing expensive GPUs. They have been shown to perform competitively across various academic and chat benchmarks.

Why it matters?

This research is important because it democratizes access to advanced AI technology by making it available on everyday devices. By allowing smaller models to perform complex tasks effectively, H2O-Danube3 enables more people and businesses to utilize AI for applications like customer service, education, and content creation without the need for costly infrastructure.

Abstract

We present H2O-Danube3, a series of small language models consisting of H2O-Danube3-4B, trained on 6T tokens and H2O-Danube3-500M, trained on 4T tokens. Our models are pre-trained on high quality Web data consisting of primarily English tokens in three stages with different data mixes before final supervised tuning for chat version. The models exhibit highly competitive metrics across a multitude of academic, chat, and fine-tuning benchmarks. Thanks to its compact architecture, H2O-Danube3 can be efficiently run on a modern smartphone, enabling local inference and rapid processing capabilities even on mobile devices. We make all models openly available under Apache 2.0 license further democratizing LLMs to a wider audience economically.

View Paper