Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

André Storhaug, Jingyue Li

2024-11-11

Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study

Summary

This paper discusses a study on how to fine-tune large language models (LLMs) efficiently for generating unit tests, which are essential for checking if code works correctly.

What's the problem?

While LLMs like GitHub Copilot help programmers write code, they often need extra training (fine-tuning) to perform specific tasks like generating unit tests. However, fine-tuning these large models can be very expensive and time-consuming. Traditional methods require adjusting all the model's parameters, which is not efficient, especially for specialized tasks like unit test generation.

What's the solution?

The authors explore parameter-efficient fine-tuning (PEFT), which only adjusts a small part of the model's parameters instead of all of them. They investigate different PEFT methods, including LoRA and prompt tuning, to see how well they work for generating unit tests compared to full fine-tuning. Their experiments show that PEFT can achieve similar performance to full fine-tuning while being much cheaper and faster. Among the methods tested, prompt tuning was found to be the most effective in terms of cost and resource use.

Why it matters?

This research is important because it makes it easier and more affordable for developers to train LLMs for specific tasks like unit test generation. By improving how these models can be fine-tuned, it helps enhance the quality of software development tools, ultimately leading to better and more reliable software.

Abstract

The advent of large language models (LLMs) like GitHub Copilot has significantly enhanced programmers' productivity, particularly in code generation. However, these models often struggle with real-world tasks without fine-tuning. As LLMs grow larger and more performant, fine-tuning for specialized tasks becomes increasingly expensive. Parameter-efficient fine-tuning (PEFT) methods, which fine-tune only a subset of model parameters, offer a promising solution by reducing the computational costs of tuning LLMs while maintaining their performance. Existing studies have explored using PEFT and LLMs for various code-related tasks and found that the effectiveness of PEFT techniques is task-dependent. The application of PEFT techniques in unit test generation remains underexplored. The state-of-the-art is limited to using LLMs with full fine-tuning to generate unit tests. This paper investigates both full fine-tuning and various PEFT methods, including LoRA, (IA)^3, and prompt tuning, across different model architectures and sizes. We use well-established benchmark datasets to evaluate their effectiveness in unit test generation. Our findings show that PEFT methods can deliver performance comparable to full fine-tuning for unit test generation, making specialized fine-tuning more accessible and cost-effective. Notably, prompt tuning is the most effective in terms of cost and resource utilization, while LoRA approaches the effectiveness of full fine-tuning in several cases.

View Paper