Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

Alessandro Giagnorio, Alberto Martin-Lopez, Gabriele Bavota

2025-02-07

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

Summary

This paper talks about improving how AI models generate code for less common programming languages. It explores different methods to help these models perform better when there's not much training data available.

What's the problem?

AI models called Large Language Models (LLMs) are great at generating code, but they struggle with less popular programming languages because there's not enough data to train them properly. This leads to poorer performance compared to how they handle more common languages.

What's the solution?

The researchers tested three main approaches: fine-tuning the models with limited data, using in-context learning with special prompts, and teaching the models to translate between common and uncommon languages. They tried these methods on two less common languages (R and Racket) using six different AI models of various sizes.

Why it matters?

This research matters because it helps make AI coding assistants more useful for a wider range of programming languages. By finding ways to improve performance on less common languages, it could make these tools more accessible to developers working with niche or specialized programming languages, potentially speeding up software development in various fields.

Abstract

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource languages (i.e., niche programming languages characterized by the scarcity of training data), the limited availability of such data hampers the models' ability to generalize effectively, resulting in poorer code generation performance as compared to high-resource languages. For this reason, there is a quest for techniques able to close this performance gap. We present an empirical study investigating the effectiveness of several approaches for boosting LLMs' performance on low-resource languages, namely: (i) a classic fine-tuning, which is however capped in size by the scarcity of training data; (ii) three variants of in-context learning, with prompts crafted to provide the LLM with additional information about the low-resource language (e.g., few-shot examples showcasing features of the targeted language); and (iii) a pre-training objective teaching the model how to translate between high- and low-resource languages. The context of our study are two low-resource languages (R and Racket) and six LLMs having different architectures and sizes. Our findings reveal that a fine-tuning is usually the best choice for smaller LLMs, possibly due to the fact that even a small dataset is sufficient to train their limited number of parameters. With the increase in size of the models, in-context learning becomes more and more effective, representing a safe and cheap bet (i.e., it always helps, but with different magnitudes). Differently, very large LLMs may deteriorate their performance on low-resource languages when fine-tuning is performed, possibly due to the lack of enough data needed to effectively update their weights.

View Paper