Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling

Antal van den Bosch, Ainhoa Risco Patón, Teun Buijse, Peter Berck, Maarten van Gompel

2025-10-28

Memory-based Language Models: An Efficient, Explainable, and Eco-friendly Approach to Large Language Modeling

Summary

This paper introduces a new way to build language models, which are the systems that predict the next word in a sentence, as a more efficient and environmentally friendly alternative to the popular deep learning approaches like GPT-2 and GPT-Neo.

What's the problem?

Current state-of-the-art language models are based on deep neural networks, which require massive amounts of computing power to train and run, leading to high energy consumption and a significant environmental impact. These models are also complex, making it hard to understand *why* they make certain predictions.

What's the solution?

The researchers developed a 'memory-based' language model called OLIFANT. Instead of complex calculations, it works by storing past text and finding the most similar examples to predict the next word. This is done using a fast way to find the 'nearest neighbors' in the stored memory, and importantly, it can run effectively on standard computer processors (CPUs) without needing specialized hardware. This makes it faster and uses less energy.

Why it matters?

This research is important because it shows that you can achieve good language modeling performance without relying on huge, power-hungry neural networks. It offers a more sustainable and transparent approach to building these models, potentially making them more accessible and reducing their environmental footprint.

Abstract

We present memory-based language modeling as an efficient, eco-friendly alternative to deep neural network-based language modeling. It offers log-linearly scalable next-token prediction performance and strong memorization capabilities. Implementing fast approximations of k-nearest neighbor classification, memory-based language modeling leaves a relatively small ecological footprint both in training and in inference mode, as it relies fully on CPUs and attains low token latencies. Its internal workings are simple and fully transparent. We compare our implementation of memory-based language modeling, OLIFANT, with GPT-2 and GPT-Neo on next-token prediction accuracy, estimated emissions and speeds, and offer some deeper analyses of the model.

View Paper