When "Correct" Is Not Safe: Can We Trust Functionally Correct Patches Generated by Code Agents?
Yibo Peng, James Song, Lei Li, Xinyu Yang, Mihai Christodorescu, Ravi Mangal, Corina Pasareanu, Haizhong Zheng, Beidi Chen
2025-10-22
Summary
This paper investigates a security flaw in code-fixing agents powered by large language models, like ChatGPT. These agents are becoming common for automatically fixing bugs in code, but current testing focuses only on whether the fix *works*, not whether it introduces new security problems.
What's the problem?
The main issue is that these code agents can create fixes that pass all the tests, meaning they functionally correct the bug, but still contain security vulnerabilities. This is called a 'Functionally Correct yet Vulnerable' (FCV) patch. Attackers could intentionally create these vulnerable fixes, or they could be introduced accidentally by the agent itself during the fixing process. Current methods for evaluating these agents don't detect these hidden security risks.
What's the solution?
The researchers developed a method called 'FCV-Attack' to demonstrate this problem. They didn't need to know how the code agent worked internally – they just sent it bug-fixing requests. They tested several popular language models (like ChatGPT and Claude) combined with different agent frameworks (like SWE-agent and OpenHands) on a standard set of coding challenges. They found that, with just one carefully crafted request, they could get the agents to create vulnerable fixes a significant percentage of the time, for example, around 40.7% of the time with a specific model and framework combination.
Why it matters?
This research highlights a serious, previously overlooked security threat with automated code-fixing tools. Because these agents are increasingly used in real-world software development, it’s crucial to develop ways to test for and prevent these vulnerable fixes. Simply ensuring a fix works isn't enough; it also needs to be secure, and current evaluation methods aren't doing that.
Abstract
Code agents are increasingly trusted to autonomously fix bugs on platforms such as GitHub, yet their security evaluation focuses almost exclusively on functional correctness. In this paper, we reveal a novel type of threat to real-world code agents: Functionally Correct yet Vulnerable (FCV) patches, which pass all test cases but contain vulnerable code. With our proposed FCV-Attack, which can be deliberately crafted by malicious attackers or implicitly introduced by benign developers, we show that SOTA LLMs (e.g., ChatGPT and Claude) and agent scaffolds (e.g., SWE-agent and OpenHands) are all vulnerable to this FCV threat; across 12 agent-model combinations on SWE-Bench, the attack only requires black-box access and a single query to the code agent to perform the attack. For example, for CWE-538 (information exposure vulnerability), the FCV-Attack attains an attack success rate of 40.7% on GPT-5 Mini + OpenHands. Our results reveal an important security threat overlooked by current evaluation paradigms and urge the development of security-aware defenses for code agents.