Cross-Domain Evaluation of Transformer-Based Vulnerability Detection on Open & Industry Data
Moritz Mock, Thomas Forrer, Barbara Russo
2025-09-12
Summary
This research focuses on making advanced vulnerability detection, specifically using deep learning, actually useful for software developers in real-world situations, not just in research labs.
What's the problem?
Currently, many promising vulnerability detection methods developed in universities don't easily translate to how companies build software. There are several reasons for this: developers need to trust the tools, these tools need to work with older, existing code, not everyone is an AI expert, and there's a difference between what researchers focus on and what companies need. Also, deep learning models can be slow and hard to integrate into the everyday process of writing and reviewing code.
What's the solution?
The researchers tested a deep learning model called CodeBERT to see how well it could find vulnerable parts of code, both in open-source projects and in code used by a company. They found that models trained on company code worked best on similar company code, but didn't do as well on open-source code. However, a model trained on open-source code, with some adjustments to handle uneven data, could actually improve vulnerability detection. Based on this, they built a tool called AI-DO that suggests potential vulnerabilities to developers *while* they're reviewing code, fitting seamlessly into their existing workflow. They then asked the company's IT staff if they found the tool helpful.
Why it matters?
This work is important because it bridges the gap between academic research and practical application in cybersecurity. By creating a tool that developers actually find useful and integrating it into their existing processes, it helps make software more secure in the real world. It also shows how to adapt deep learning models to work effectively in industrial settings, even with limited resources and expertise.
Abstract
Deep learning solutions for vulnerability detection proposed in academic research are not always accessible to developers, and their applicability in industrial settings is rarely addressed. Transferring such technologies from academia to industry presents challenges related to trustworthiness, legacy systems, limited digital literacy, and the gap between academic and industrial expertise. For deep learning in particular, performance and integration into existing workflows are additional concerns. In this work, we first evaluate the performance of CodeBERT for detecting vulnerable functions in industrial and open-source software. We analyse its cross-domain generalisation when fine-tuned on open-source data and tested on industrial data, and vice versa, also exploring strategies for handling class imbalance. Based on these results, we develop AI-DO(Automating vulnerability detection Integration for Developers' Operations), a Continuous Integration-Continuous Deployment (CI/CD)-integrated recommender system that uses fine-tuned CodeBERT to detect and localise vulnerabilities during code review without disrupting workflows. Finally, we assess the tool's perceived usefulness through a survey with the company's IT professionals. Our results show that models trained on industrial data detect vulnerabilities accurately within the same domain but lose performance on open-source code, while a deep learner fine-tuned on open data, with appropriate undersampling techniques, improves the detection of vulnerabilities.