Abstract
In this work, we introduce self-infilling code generation, a general framework that incorporates infilling operations into auto-regressive decoding.Our approach capitalizes on the observation that recent infilling-capable code language models can perform self-infilling: whereas conventional infilling is designed to fill in the middle based on a predefined prefix and suffix, self-infilling sequentially generates both such surrounding context and the infilled content.We utilize self-infilling to introduce novel interruption and looping mechanisms in conventional decoding, evolving it into a non-monotonic process.Interruptions allow for postponing the generation of specific code until a definitive suffix is established, enhancing control during decoding.Meanwhile, the looping mechanism, which leverages the complementary nature of self-infilling and left-to-right decoding, can iteratively update and synchronize each piece of generation cyclically.Extensive experiments across a variety of code generation benchmarks demonstrate that decoding with self-infilling not only improves the output quality but also regularizes the overall generation, which effectively mitigates potential degeneration and scaffolds code to be more consistent with intended functionality.
| Original language | English |
|---|---|
| Article number | 2548 |
| Pages (from-to) | 61614-61648 |
| Number of pages | 35 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 235 |
| Publication status | Published - Jul 2024 |
| Externally published | Yes |
| Event | 41st International Conference on Machine Learning, ICML 2024 - Vienna, Austria Duration: 21 Jul 2024 → 27 Jul 2024 |
ASJC Scopus subject areas
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability