Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

Yifeng Ding, Hantian Ding, Shiqi Wang, Qing Sun, Varun Kumar, Zijian Wang

2024-10-07

Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning

Summary

This paper introduces Horizon-Length Prediction (HLP), a new training method designed to improve how code generation models fill in missing code by predicting how many tokens they need to generate.

What's the problem?

Current methods for filling in missing parts of code often struggle to create smooth and coherent code that fits well with the surrounding context. This is because traditional training techniques do not effectively teach models how to plan ahead, leading to issues when trying to generate code that makes sense based on what has already been written. Additionally, existing solutions rely on strict rules that limit their usefulness in real-world coding tasks.

What's the solution?

To address these problems, the authors propose HLP, which teaches models to predict the number of missing tokens (the 'horizon length') they need to generate at each step. This helps the model understand how much more code it needs to create, allowing for better planning and more coherent code completion. The authors tested HLP on different models and found that it significantly improved performance by up to 24% on various benchmarks without relying on unrealistic post-processing methods.

Why it matters?

This research is important because it enhances the capabilities of code generation models, making them more effective at understanding and completing code. By improving how these models plan and generate code, HLP can lead to better tools for developers, making coding easier and more efficient in real-world applications.

Abstract

Fill-in-the-Middle (FIM) has become integral to code language models, enabling generation of missing code given both left and right contexts. However, the current FIM training paradigm, which reorders original training sequences and then performs regular next-token prediction (NTP), often leads to models struggling to generate content that aligns smoothly with the surrounding context. Crucially, while existing works rely on rule-based post-processing to circumvent this weakness, such methods are not practically usable in open-domain code completion tasks as they depend on restrictive, dataset-specific assumptions (e.g., generating the same number of lines as in the ground truth). Moreover, model performance on FIM tasks deteriorates significantly without these unrealistic assumptions. We hypothesize that NTP alone is insufficient for models to learn effective planning conditioned on the distant right context, a critical factor for successful code infilling. To overcome this, we propose Horizon-Length Prediction (HLP), a novel training objective that teaches models to predict the number of remaining middle tokens (i.e., horizon length) at each step. HLP advances FIM with lookahead planning, enabling models to inherently learn infilling boundaries for arbitrary left and right contexts without relying on dataset-specific post-processing. Our evaluation across different models and sizes shows that HLP significantly improves FIM performance by up to 24% relatively on diverse benchmarks, across file-level and repository-level, and without resorting to unrealistic post-processing methods. Furthermore, the enhanced planning capability gained through HLP boosts model performance on code reasoning. Importantly, HLP only incurs negligible training overhead and no additional inference cost, ensuring its practicality for real-world scenarios.

View Paper