LLäMmlein: Compact and Competitive German-Only Language Models from Scratch
Jan Pfister, Julia Wunderle, Andreas Hotho
2024-11-19

Summary
This paper discusses the creation of two German-only language models, LLäMmlein 120M and 1B, designed to help advance research in natural language processing (NLP) for the German language.
What's the problem?
While there are many language models available, most are either multilingual or focused on English. This creates a gap for researchers and developers working specifically with the German language, as they often lack models that are tailored to their needs. Additionally, existing models may not perform well on tasks specific to German due to differences in language structure and usage.
What's the solution?
The authors developed LLäMmlein from scratch, creating two versions with different sizes (120 million and 1 billion parameters) specifically for German. They went through several steps to ensure the models were effective, including preprocessing data, creating a custom tokenizer for German, and evaluating the models using benchmarks like SuperGLEBer. They also saved multiple checkpoints during training to analyze how well the models were learning.
Why it matters?
This research is important because it provides high-quality language models that are specifically designed for the German language. By making these models available along with their training data, the authors aim to support the German NLP research community, helping researchers develop better tools and applications that can understand and process German text more effectively.
Abstract
We create two German-only decoder models, LL\"aMmlein 120M and 1B, transparently from scratch and publish them, along with the training data, for the German NLP research community to use. The model training involved several key steps, including extensive data preprocessing, the creation of a custom German tokenizer, the training itself, as well as the evaluation of the final models on various benchmarks. Throughout the training process, multiple checkpoints were saved and analyzed using the SuperGLEBer benchmark to monitor the models' learning dynamics. Compared to state-of-the-art models on the SuperGLEBer benchmark, both LL\"aMmlein models performed competitively, consistently matching or surpassing models with similar parameter sizes. The results show that the models' quality scales with size as expected, but performance improvements on some tasks plateaued early, offering valuable insights into resource allocation for future model development.