CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging

Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez

2025-02-11

CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging

Summary

This paper talks about CODESIM, a new way to make computers write code that works more like how humans think and solve problems.

What's the problem?

Current methods for getting computers to write code rely heavily on fixing mistakes after the code is written, which isn't very efficient. The quality of the initial code is often not great, making it harder to fix later.

What's the solution?

The researchers created CODESIM, which uses three different 'agents' or parts that work together: one for planning, one for writing code, and one for fixing mistakes. CODESIM is special because it checks its work step-by-step, similar to how a human might draw out a problem to understand it better. This helps catch and fix mistakes early on.

Why it matters?

This matters because CODESIM is much better at writing correct code than previous methods. It scored really well on tough coding tests, beating the best existing systems. By making it easier for computers to write good code, CODESIM could help programmers work faster and create better software. The researchers also shared their work online, which means other people can use and improve it, potentially leading to even better coding systems in the future.

Abstract

Large Language Models (LLMs) have made significant strides in code generation and problem solving. Current approaches employ external tool-based iterative debuggers that use compiler or other tool-based runtime feedback to refine coarse programs generated by various methods. However, the effectiveness of these approaches heavily relies on the quality of the initial code generation, which remains an open challenge. In this paper, we introduce CodeSim, a novel multi-agent code generation framework that comprehensively addresses the stages of program synthesis-planning, coding, and debugging-through a human-like perception approach. As human verifies their understanding of any algorithms through visual simulation, CodeSim uniquely features a method of plan verification and internal <PRE_TAG>debugging</POST_TAG> through the step-by-step simulation of input/output. Extensive experiments across seven challenging competitive problem-solving and program synthesis benchmarks demonstrate CodeSim's remarkable code generation capabilities. Our framework achieves new state-of-the-art (pass@1) results-(HumanEval 95.1%, MBPP 90.7%, APPS 22%, and CodeContests 29.1%). Furthermore, our method shows potential for even greater enhancement when cascaded with external debuggers. To facilitate further research and development in this area, we have open-sourced our framework in this link (https://kagnlp.github.io/codesim.github.io/).

View Paper