Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks

Hongyuan Tao, Ying Zhang, Zhenhao Tang, Hongen Peng, Xukun Zhu, Bingchang Liu, Yingguang Yang, Ziyin Zhang, Zhaogui Xu, Haipeng Zhang, Linchao Zhu, Rui Wang, Hang Yu, Jianguo Li, Peng Di

2025-05-28

Code Graph Model (CGM): A Graph-Integrated Large Language Model for
Repository-Level Software Engineering Tasks

Summary

This paper talks about a new type of AI model called the Code Graph Model, or CGM, which helps computers understand and generate code for big software projects by looking at how different parts of the code are connected, kind of like a map.

What's the problem?

The problem is that most AI models have trouble handling large and complex software projects because they can't easily see how all the different files and code pieces relate to each other, which makes it hard for them to write or fix code across an entire project.

What's the solution?

The researchers created CGM, which uses a special way of representing code as a graph, showing the connections between different parts. This graph is built into the AI's thinking process, so it can pay attention to important relationships in the code and do a better job at tasks like generating new code or understanding big projects, all without needing extra helper programs.

Why it matters?

This matters because it makes AI much more useful for real-world software development, helping programmers work faster and with fewer mistakes, especially when dealing with huge codebases.

Abstract

Open-source Code Graph Models enhance repository-level code generation tasks by integrating code graph structures into LLMs' attention mechanisms, achieving high performance without agent-based approaches.

View Paper