Lizard: An Efficient Linearization Framework for Large Language Models

Chien Van Nguyen, Ruiyi Zhang, Hanieh Deilamsalehy, Puneet Mathur, Viet Dac Lai, Haoliang Wang, Jayakumar Subramanian, Ryan A. Rossi, Trung Bui, Nikos Vlassis, Franck Dernoncourt, Thien Huu Nguyen

2025-07-17

Lizard: An Efficient Linearization Framework for Large Language Models

Summary

This paper talks about Lizard, a new framework that makes large language models much more efficient by changing how they handle memory and attention, so they can work with very long texts without slowing down.

What's the problem?

The problem is that standard language models use a type of attention that requires a lot of memory and computation, especially as the text they process gets longer, making it hard to use them for tasks needing long context understanding.

What's the solution?

The authors developed Lizard to transform pretrained Transformer models into versions that use a hybrid attention mechanism combining gated linear attention and sliding window attention, along with a gating module for better memory control. This lets the models remember important information and manage long texts efficiently, while keeping performance close to the original models. They also designed hardware-aware training to speed up the process.

Why it matters?

This matters because it allows AI models to handle much longer pieces of text effectively, improving performance on tasks like reading comprehension and reasoning, while using less memory and computing power, making these models more practical and accessible.

Abstract

Lizard is a linearization framework that transforms Transformer-based LLMs into subquadratic architectures for efficient infinite-context generation, using a hybrid attention mechanism and hardware-aware training.

View Paper