A Rapid Review on Graph-Based Learning Vulnerability Detection

Foulefack, Rosmael; Marchetto, Alessandro

doi:10.1007/978-3-031-70245-7_25

Security testing aims at identifying software vulnerabilities that can be exploited by malicious actors. Software vulnerability detection (SVD), in particular, studies techniques to identify source code weaknesses and bugs that could lead to vulnerabilities. Automated SVD has made significant progress thanks to artificial intelligence tools, including large language models and deep learning. Most of the existing SVD techniques extract a token-based vector representation from the source code under analysis, and then pass it to a learning algorithm inspired by the ones used for natural language processing. The code vulnerability detection is hence considered as a binary classification task: the learned model is used to predict whether new code snippets are vulnerable or not. A recent trend consists in extracting graph-based structures from the source code, and then pass them to a graph-based learning algorithm. The graph-based code representation is expected to be an enabler to search for vulnerabilities by considering syntactic, semantic, and structural information of the source code. This paper reports on a rapid review conducted to study the literature about code graph-based learning SVD, with the goal of capturing evidence that can be transferred to practitioners. Our analysis reveals that most of the presented graph-based learning SVD techniques: (i) use combinations of graphs extracted from the source code by (almost) employing the same tool; (ii) use frequently Graph Neural Networks (GNN) and Gated Graph Sequence Neural Networks (GGNN); (iii) work at the function level, and only rarely at the statement level. Furthermore, we also noticed that: (iv) only a limited number of tools that support such techniques are available, and (v) several real-world datasets exist and are largely used, however, they are unbalanced, labeled only at the function level, and, in most cases, contain C/C++ source code, thus hampering their adoption with other programming languages.

A Rapid Review on Graph-Based Learning Vulnerability Detection / Foulefack, Rosmael; Marchetto, Alessandro. - 2178:(2024), pp. 355-372. ( 17th International Conference on Quality of Information and Communications Technology, QUATIC 2024 Pisa September 11-13, 2024) [10.1007/978-3-031-70245-7_25].

A Rapid Review on Graph-Based Learning Vulnerability Detection

Foulefack, Rosmael;Marchetto. Alessandro

2024-01-01

Abstract

Security testing aims at identifying software vulnerabilities that can be exploited by malicious actors. Software vulnerability detection (SVD), in particular, studies techniques to identify source code weaknesses and bugs that could lead to vulnerabilities. Automated SVD has made significant progress thanks to artificial intelligence tools, including large language models and deep learning. Most of the existing SVD techniques extract a token-based vector representation from the source code under analysis, and then pass it to a learning algorithm inspired by the ones used for natural language processing. The code vulnerability detection is hence considered as a binary classification task: the learned model is used to predict whether new code snippets are vulnerable or not. A recent trend consists in extracting graph-based structures from the source code, and then pass them to a graph-based learning algorithm. The graph-based code representation is expected to be an enabler to search for vulnerabilities by considering syntactic, semantic, and structural information of the source code. This paper reports on a rapid review conducted to study the literature about code graph-based learning SVD, with the goal of capturing evidence that can be transferred to practitioners. Our analysis reveals that most of the presented graph-based learning SVD techniques: (i) use combinations of graphs extracted from the source code by (almost) employing the same tool; (ii) use frequently Graph Neural Networks (GNN) and Gated Graph Sequence Neural Networks (GGNN); (iii) work at the function level, and only rarely at the statement level. Furthermore, we also noticed that: (iv) only a limited number of tools that support such techniques are available, and (v) several real-world datasets exist and are largely used, however, they are unbalanced, labeled only at the function level, and, in most cases, contain C/C++ source code, thus hampering their adoption with other programming languages.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Quality of Information and Communications Technology
			
	Luogo di edizione (Place of publication)
	
				Cham (SW)
			
	Casa editrice (Publisher)
	
				Springer Cham
			
	ISBN
	
				978-3-031-70244-0
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85204585688
			
	Codice WOS (WOS identifier)
	
				WOS:001339216200025
			
	Tutti gli autori
	
						Foulefack, Rosmael; Marchetto, Alessandro
					
	Citazione
	
				A Rapid Review on Graph-Based Learning Vulnerability Detection / Foulefack, Rosmael; Marchetto, Alessandro. - 2178:(2024), pp. 355-372. ( 17th International Conference on Quality of Information and Communications Technology, QUATIC 2024 Pisa September 11-13, 2024) [10.1007/978-3-031-70245-7_25].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
ma_graph.pdf Solo gestori archivio Descrizione: Quatic 2024 paper Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 926.06 kB Formato Adobe PDF Visualizza/Apri	926.06 kB	Adobe PDF	Visualizza/Apri