ARC-Encoder: learning compressed text representations for large language models

Hippolyte Pilchen, Edouard Grave, Patrick Pérez

2025-10-27

ARC-Encoder: learning compressed text representations for large language models

Summary

This paper introduces a new way to handle long pieces of text when working with large language models (LLMs). It focuses on making these models more efficient without sacrificing their overall performance.

What's the problem?

Large language models are getting better, but they struggle with very long inputs. Techniques to handle longer inputs, like giving the model more context or having it think step-by-step, make the process much slower and more expensive. Existing methods to shorten the input text often require retraining the entire model, which can make it worse at other tasks.

What's the solution?

The researchers developed something called an ARC-Encoder. This encoder takes long text and compresses it into a smaller, more manageable representation – essentially a summary of the important information. Instead of feeding the LLM the original words, it gets this compressed version. They experimented with different ways to build and train this encoder to make it as effective as possible, and designed it to work with multiple different LLMs without needing to be retrained for each one.

Why it matters?

This work is important because it offers a practical solution to the problem of long inputs for LLMs. By making models more efficient, it lowers the cost of using them and opens the door to more complex applications. The fact that the ARC-Encoder can work with various LLMs makes it a versatile tool for anyone working with these powerful AI systems.

Abstract

Recent techniques such as retrieval-augmented generation or chain-of-thought reasoning have led to longer contexts and increased inference costs. Context compression techniques can reduce these costs, but the most effective approaches require fine-tuning the target model or even modifying its architecture. This can degrade its general abilities when not used for this specific purpose. Here we explore an alternative approach: an encoder that compresses the context into continuous representations which replace token embeddings in decoder LLMs. First, we perform a systematic study of training strategies and architecture choices for the encoder. Our findings led to the design of an Adaptable text Representations Compressor, named ARC-Encoder, which outputs x-times fewer continuous representations (typically x!in!{4,8}) than text tokens. We evaluate ARC-Encoder across a variety of LLM usage scenarios, ranging from in-context learning to context window extension, on both instruct and base decoders. Results show that ARC-Encoder achieves state-of-the-art performance on several benchmarks while improving computational efficiency at inference. Finally, we demonstrate that our models can be adapted to multiple decoders simultaneously, allowing a single encoder to generalize across different decoder LLMs. This makes ARC-Encoder a flexible and efficient solution for portable encoders that work seamlessly with multiple LLMs. We release a training code at https://github.com/kyutai-labs/ARC-Encoder , fine-tuning dataset and pretrained models are available at https://huggingface.co/collections/kyutai/arc-encoders-68ee18787301407d60a57047 .

View Paper