Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Trapoom Ukarapol, Zhicheng Lee, Amy Xin

2024-08-02

Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning

Summary

This paper discusses a method to improve smaller language models by enhancing their text embeddings using a technique called contrastive fine-tuning. The focus is on making these models perform better while being less resource-intensive than larger models.

What's the problem?

Large language models, while powerful, require a lot of computing power and resources, making them hard to use for many people. Smaller models, like MiniCPM, are more practical but often do not perform as well without special training techniques. This creates a challenge for developers who want effective language models that are also accessible.

What's the solution?

The authors explore how to enhance the performance of smaller language models by improving their text embeddings, which are the representations of words and phrases in a way that the model can understand. They selected three models—MiniCPM, Phi-2, and Gemma—and applied contrastive fine-tuning on a dataset focused on natural language inference (NLI). This approach helped all three models improve their performance, with MiniCPM showing an impressive average gain of 56.33% in performance across various tests.

Why it matters?

This research is important because it makes advanced language processing more accessible by improving the performance of smaller models. By showing that these models can achieve significant gains through effective training techniques, it opens up possibilities for more developers and researchers to use AI in applications like chatbots, translation services, and content generation without needing extensive resources.

Abstract

While Large Language Models show remarkable performance in natural language understanding, their resource-intensive nature makes them less accessible. In contrast, smaller language models such as MiniCPM offer more sustainable scalability, but often underperform without specialized optimization. In this paper, we explore the enhancement of smaller language models through the improvement of their text embeddings. We select three language models, MiniCPM, Phi-2, and Gemma, to conduct contrastive fine-tuning on the NLI dataset. Our results demonstrate that this fine-tuning method enhances the quality of text embeddings for all three models across various benchmarks, with MiniCPM showing the most significant improvements of an average 56.33\% performance gain. The contrastive fine-tuning code is publicly available at https://github.com/trapoom555/Language-Model-STS-CFT.

View Paper