Why Personalizing Deep Learning-Based Code Completion Tools Matters

Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota

2025-03-21

Why Personalizing Deep Learning-Based Code Completion Tools Matters

Summary

This paper investigates whether AI code completion tools work better when they are customized for specific companies or individual programmers.

What's the problem?

We don't know if it's worth the effort to customize AI code completion tools for different organizations or developers.

What's the solution?

The researchers tested code completion tools on programmers from Apache and Spring and found that customizing the tools for a specific company or even a specific developer does improve performance.

Why it matters?

This work matters because it shows that personalizing AI code completion tools can make programmers more efficient and save resources.

Abstract

Deep learning (DL)-based code completion tools have transformed software development by enabling advanced code generation. These tools leverage models trained on vast amounts of code from numerous repositories, capturing general coding patterns. However, the impact of fine-tuning these models for specific organizations or developers to boost their performance on such subjects remains unexplored. In this work, we fill this gap by presenting solid empirical evidence answering this question. More specifically, we consider 136 developers from two organizations (Apache and Spring), two model architectures (T5 and Code Llama), and three model sizes (60M, 750M, and 7B trainable parameters). T5 models (60M, 750M) were pre-trained and fine-tuned on over 2,000 open-source projects, excluding the subject organizations' data, and compared against versions fine-tuned on organization- and developer-specific datasets. For the Code Llama model (7B), we compared the performance of the already pre-trained model publicly available online with the same model fine-tuned via parameter-efficient fine-tuning on organization- and developer-specific datasets. Our results show that there is a boost in prediction capabilities provided by both an organization-specific and a developer-specific additional fine-tuning, with the former being particularly performant. Such a finding generalizes across (i) the two subject organizations (i.e., Apache and Spring) and (ii) models of completely different magnitude (from 60M to 7B trainable parameters). Finally, we show that DL models fine-tuned on an organization-specific dataset achieve the same completion performance of pre-trained code models used out of the box and being sim10times larger, with consequent savings in terms of deployment and inference cost (e.g., smaller GPUs needed).

View Paper