< Explain other AI papers

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation

Itay Nakash, Nitay Calderon, Eyal Ben David, Elad Hoffer, Roi Reichart

2025-03-31

AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through
  Lightweight Vocabulary Adaptation

Summary

This paper is about making large AI language models faster and more efficient by shrinking their vocabulary to focus on specific topics.

What's the problem?

Large language models are powerful, but they can be slow and expensive to use, especially when dealing with specialized subjects.

What's the solution?

The researchers developed a technique called AdaptiVocab that reduces the number of words the AI needs to process, making it faster without losing accuracy.

Why it matters?

This work matters because it can make AI language models more practical for real-world applications in specific fields like medicine or law.

Abstract

Large Language Models (LLMs) have shown impressive versatility as general purpose models. However, their broad applicability comes at a high-cost computational overhead, particularly in auto-regressive decoding where each step requires a forward pass. In domain-specific settings, general-purpose capabilities are unnecessary and can be exchanged for efficiency. In this work, we take a novel perspective on domain adaptation, reducing latency and computational costs by adapting the vocabulary to focused domains of interest. We introduce AdaptiVocab, an end-to-end approach for vocabulary adaptation, designed to enhance LLM efficiency in low-resource domains. AdaptiVocab can be applied to any tokenizer and architecture, modifying the vocabulary by replacing tokens with domain-specific n-gram-based tokens, thereby reducing the number of tokens required for both input processing and output generation. AdaptiVocab initializes new n-token embeddings using an exponentially weighted combination of existing embeddings and employs a lightweight fine-tuning phase that can be efficiently performed on a single GPU. We evaluate two 7B LLMs across three niche domains, assessing efficiency, generation quality, and end-task performance. Our results show that AdaptiVocab reduces token usage by over 25% without compromising performance