Objective: We investigated whether (possibly wrong) security patches suggested by Automated Program Repairs (APR) for real world projects are recognized by human reviewers. We also investigated whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers. Method: We perform an experiment with n=72 Master students in Computer Science. In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security-specialized tools (even if the tool was actually a ‘normal’ APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch. Results: It is easier to identify wrong patches than correct patches, and correct patches are not confused with partially correct patches. Also patches from APR Security tools are adopted more often than patches suggested by generic APR tools but there is not enough evidence to verify if ‘bogus’ security claims are distinguishable from ‘true security’ claims. Finally, the number of switches to the patches suggested by security tool is significantly higher after the security information is revealed irrespective of correctness. Limitations: The experiment was conducted in an academic setting, and focused on a limited sample of popular APR tools and popular vulnerability types.
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools / Papotti, A.; Paramitha, R.; Massacci, F.. - In: EMPIRICAL SOFTWARE ENGINEERING. - ISSN 1382-3256. - 29:5(2024), p. 132. [10.1007/s10664-024-10506-z]
On the acceptance by code reviewers of candidate security patches suggested by Automated Program Repair tools
Papotti A.;Paramitha R.;Massacci F.
Ultimo
2024-01-01
Abstract
Objective: We investigated whether (possibly wrong) security patches suggested by Automated Program Repairs (APR) for real world projects are recognized by human reviewers. We also investigated whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers. Method: We perform an experiment with n=72 Master students in Computer Science. In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security-specialized tools (even if the tool was actually a ‘normal’ APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch. Results: It is easier to identify wrong patches than correct patches, and correct patches are not confused with partially correct patches. Also patches from APR Security tools are adopted more often than patches suggested by generic APR tools but there is not enough evidence to verify if ‘bogus’ security claims are distinguishable from ‘true security’ claims. Finally, the number of switches to the patches suggested by security tool is significantly higher after the security information is revealed irrespective of correctness. Limitations: The experiment was conducted in an academic setting, and focused on a limited sample of popular APR tools and popular vulnerability types.| File | Dimensione | Formato | |
|---|---|---|---|
|
Papotti-Paramitha-Massacci-s10664-024-10506-z.pdf
accesso aperto
Descrizione: versione finale
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
2.16 MB
Formato
Adobe PDF
|
2.16 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



