Large Language Model Guided Self-Debugging Code Generation

Muntasir Adnan, Zhiwei Xu, Carlos C. N. Kuhn

2025-02-06

Large Language Model Guided Self-Debugging Code Generation

Summary

This paper talks about PyCapsule, a new system that helps AI models generate Python code more efficiently and accurately by using a two-agent setup and self-debugging tools.

What's the problem?

Current methods for automated code generation often struggle with efficiency and error correction. They can produce incorrect or unstable code, making it hard to rely on them for real-world programming tasks.

What's the solution?

The researchers developed PyCapsule, which uses a two-agent framework to improve Python code generation. It includes advanced features like smart prompts, iterative error fixing, and case testing to ensure the generated code is stable, safe, and correct. PyCapsule also reduces the number of errors during debugging by carefully handling error messages and refining the process.

Why it matters?

This research is important because it makes AI-generated code more reliable and efficient. By improving the accuracy of code generation and reducing computational costs, PyCapsule could help programmers save time and make AI tools more practical for coding tasks in both education and industry.

Abstract

Automated code generation is gaining significant importance in intelligent computer programming and system deployment. However, current approaches often face challenges in computational efficiency and lack robust mechanisms for code parsing and error correction. In this work, we propose a novel framework, PyCapsule, with a simple yet effective two-agent pipeline and efficient self-debugging modules for Python code generation. PyCapsule features sophisticated prompt inference, iterative error handling, and case testing, ensuring high generation stability, safety, and correctness. Empirically, PyCapsule achieves up to 5.7% improvement of success rate on HumanEval, 10.3% on HumanEval-ET, and 24.4% on BigCodeBench compared to the state-of-art methods. We also observe a decrease in normalized success rate given more self-debugging attempts, potentially affected by limited and noisy error feedback in retention. PyCapsule demonstrates broader impacts on advancing lightweight and efficient code generation for artificial intelligence systems.

View Paper