Domain-aware graph neural networks for source code vulnerability detection

Lekeufack Foulefack, R. Z.; Marchetto, A.

doi:10.1016/j.infsof.2026.108104

Context: Deep learning, in particular, graph-based models, has advanced software vulnerability detection by capturing structural code features. However, existing approaches often rely solely on source code and focus mainly on C/C++ at the function level, limiting their ability to detect fine-grained vulnerabilities in diverse languages like Java and Python. Objective: To overcome these limitations, we propose VVulDet, an enhanced Graph Neural Network model. Methods: VVulDet enriches code representations with complex graph representations through random walks feature update and incorporates domain knowledge from CVE and CWE descriptions, along with expert-provided reference code fragments. This integration enhances the model's understanding of vulnerabilities beyond pure code structure. Results: We evaluate VVulDet on four datasets covering Java, Python, and C/C++, demonstrating consistent improvements at both statement and function level detection. Notably, VVulDet achieves, on average, the highest overall performance across all datasets, demonstrating F1-score improvements of up to 8.6% and 9.6% at the statement and function levels on ProjectKB, 6.8% and 15.5% on MegaVul, 1.4% and 4.5% on CVEFixes, and 2.4% and 23.7% on BigVul, respectively, compared to the model version that does not incorporate domain knowledge. Conclusion: These results confirm that integrating domain knowledge into graph-based models significantly boosts vulnerability detection performance across multiple programming languages and granularity levels.

Context: Deep learning, in particular, graph-based models, has advanced software vulnerability detection by capturing structural code features. However, existing approaches often rely solely on source code and focus mainly on C/C++ at the function level, limiting their ability to detect fine-grained vulnerabilities in diverse languages like Java and Python. Objective: To overcome these limitations, we propose VVulDet, an enhanced Graph Neural Network model. Methods: VVulDet enriches code representations with complex graph representations through random walks feature update and incorporates domain knowledge from CVE and CWE descriptions, along with expert-provided reference code fragments. This integration enhances the model’s understanding of vulnerabilities beyond pure code structure. Results: We evaluate VVulDet on four datasets covering Java, Python, and C/C++, demonstrating consistent improvements at both statement and function level detection. Notably, VVulDet achieves, on average, the highest overall performance across all datasets, demonstrating F1-score improvements of up to 8.6% and 9.6% at the statement and function levels on ProjectKB, 6.8% and 15.5% on MegaVul, 1.4% and 4.5% on CVEFixes, and 2.4% and 23.7% on BigVul, respectively, compared to the model version that does not incorporate domain knowledge. Conclusion: These results confirm that integrating domain knowledge into graph-based models significantly boosts vulnerability detection performance across multiple programming languages and granularity levels.

Domain-aware graph neural networks for source code vulnerability detection / Lekeufack Foulefack, R.Z., Marchetto, A.. - In: INFORMATION AND SOFTWARE TECHNOLOGY. - ISSN 0950-5849. - 195:(2026), pp. 1-17. [10.1016/j.infsof.2026.108104]