Security testing aims at identifying software vulnerabilities that can be exploited by malicious actors. Software vulnerability detection (SVD), in particular, studies techniques to identify source code weaknesses and bugs that could lead to vulnerabilities. Automated SVD has made significant progress thanks to artificial intelligence tools, including large language models and deep learning. Most of the existing SVD techniques extract a token-based vector representation from the source code under analysis, and then pass it to a learning algorithm inspired by the ones used for natural language processing. The code vulnerability detection is hence considered as a binary classification task: the learned model is used to predict whether new code snippets are vulnerable or not. A recent trend consists in extracting graph-based structures from the source code, and then pass them to a graph-based learning algorithm. The graph-based code representation is expected to be an enabler to search for vulnerabilities by considering syntactic, semantic, and structural information of the source code. This paper reports on a rapid review conducted to study the literature about code graph-based learning SVD, with the goal of capturing evidence that can be transferred to practitioners. Our analysis reveals that most of the presented graph-based learning SVD techniques: (i) use combinations of graphs extracted from the source code by (almost) employing the same tool; (ii) use frequently Graph Neural Networks (GNN) and Gated Graph Sequence Neural Networks (GGNN); (iii) work at the function level, and only rarely at the statement level. Furthermore, we also noticed that: (iv) only a limited number of tools that support such techniques are available, and (v) several real-world datasets exist and are largely used, however, they are unbalanced, labeled only at the function level, and, in most cases, contain C/C++ source code, thus hampering their adoption with other programming languages.

A Rapid Review on Graph-Based Learning Vulnerability Detection / Foulefack, Rosmael; Marchetto, Alessandro. - 2178:(2024), pp. 355-372. ( 17th International Conference on Quality of Information and Communications Technology, QUATIC 2024 Pisa September 11-13, 2024) [10.1007/978-3-031-70245-7_25].

A Rapid Review on Graph-Based Learning Vulnerability Detection

Foulefack, Rosmael;Marchetto. Alessandro
2024-01-01

Abstract

Security testing aims at identifying software vulnerabilities that can be exploited by malicious actors. Software vulnerability detection (SVD), in particular, studies techniques to identify source code weaknesses and bugs that could lead to vulnerabilities. Automated SVD has made significant progress thanks to artificial intelligence tools, including large language models and deep learning. Most of the existing SVD techniques extract a token-based vector representation from the source code under analysis, and then pass it to a learning algorithm inspired by the ones used for natural language processing. The code vulnerability detection is hence considered as a binary classification task: the learned model is used to predict whether new code snippets are vulnerable or not. A recent trend consists in extracting graph-based structures from the source code, and then pass them to a graph-based learning algorithm. The graph-based code representation is expected to be an enabler to search for vulnerabilities by considering syntactic, semantic, and structural information of the source code. This paper reports on a rapid review conducted to study the literature about code graph-based learning SVD, with the goal of capturing evidence that can be transferred to practitioners. Our analysis reveals that most of the presented graph-based learning SVD techniques: (i) use combinations of graphs extracted from the source code by (almost) employing the same tool; (ii) use frequently Graph Neural Networks (GNN) and Gated Graph Sequence Neural Networks (GGNN); (iii) work at the function level, and only rarely at the statement level. Furthermore, we also noticed that: (iv) only a limited number of tools that support such techniques are available, and (v) several real-world datasets exist and are largely used, however, they are unbalanced, labeled only at the function level, and, in most cases, contain C/C++ source code, thus hampering their adoption with other programming languages.
2024
Quality of Information and Communications Technology
Cham (SW)
Springer Cham
978-3-031-70244-0
Foulefack, Rosmael; Marchetto, Alessandro
A Rapid Review on Graph-Based Learning Vulnerability Detection / Foulefack, Rosmael; Marchetto, Alessandro. - 2178:(2024), pp. 355-372. ( 17th International Conference on Quality of Information and Communications Technology, QUATIC 2024 Pisa September 11-13, 2024) [10.1007/978-3-031-70245-7_25].
File in questo prodotto:
File Dimensione Formato  
ma_graph.pdf

Solo gestori archivio

Descrizione: Quatic 2024 paper
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 926.06 kB
Formato Adobe PDF
926.06 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/432110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
  • OpenAlex ND
social impact