Open source packages have source code available on repositories for inspection (e.g. on GitHub) but developers use pre-built packages directly from the package repositories (such as npm for JavaScript, PyPI for Python, or RubyGems for Ruby). Such convenient practice assumes that there are no discrepancies between source code and packages. These differences pose both operational risks (e.g. making dependent projects unable to compile) and security risks (e.g. deploying malicious code during package installation) in the software supply chain. Our empirical assessment of 2438 popular packages in PyPI with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole most likely good but modified' packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. We show how it can be used to extend current package scanning practices for malware injection (which only covers less than 1% of the code of deployed packages).

LastPyMile: Identifying the discrepancy between sources and packages / Vu, D. -L.; Massacci, F.; Pashchenko, I.; Plate, H.; Sabetta, A.. - ELETTRONICO. - (2021), pp. 780-792. (Intervento presentato al convegno 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 tenutosi a Athens, Greece nel 23-28 August 2021) [10.1145/3468264.3468592].

LastPyMile: Identifying the discrepancy between sources and packages

Vu D. -L.;Massacci F.;Pashchenko I.;
2021-01-01

Abstract

Open source packages have source code available on repositories for inspection (e.g. on GitHub) but developers use pre-built packages directly from the package repositories (such as npm for JavaScript, PyPI for Python, or RubyGems for Ruby). Such convenient practice assumes that there are no discrepancies between source code and packages. These differences pose both operational risks (e.g. making dependent projects unable to compile) and security risks (e.g. deploying malicious code during package installation) in the software supply chain. Our empirical assessment of 2438 popular packages in PyPI with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole most likely good but modified' packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. We show how it can be used to extend current package scanning practices for malware injection (which only covers less than 1% of the code of deployed packages).
2021
ESEC/FSE 2021 - Proceedings of the 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
Association for Computing Machinery, Inc
9781450385626
Vu, D. -L.; Massacci, F.; Pashchenko, I.; Plate, H.; Sabetta, A.
LastPyMile: Identifying the discrepancy between sources and packages / Vu, D. -L.; Massacci, F.; Pashchenko, I.; Plate, H.; Sabetta, A.. - ELETTRONICO. - (2021), pp. 780-792. (Intervento presentato al convegno 29th ACM Joint Meeting European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 tenutosi a Athens, Greece nel 23-28 August 2021) [10.1145/3468264.3468592].
File in questo prodotto:
File Dimensione Formato  
esecfse2021-4.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF Visualizza/Apri
3468264.3468592 (1).pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 326.79 kB
Formato Adobe PDF
326.79 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/323677
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 22
  • OpenAlex ND
social impact