Preserving data privacy is an important concern in the research use of patient data. The DataSHIELD suite enables privacy-aware advanced statistical analysis in a federated setting. Despite its many applications, it has a few open practical issues: the complexity of hosting a federated infrastructure, the performance penalty imposed by the privacy-preserving constraints, and the ease of use by non-technical users. In this work, we describe a case study in which we review different breast cancer classifiers and report our findings about the limits and advantages of such non-disclosive suite of tools in a realistic setting. Five independent gene expression datasets of breast cancer survival were downloaded from Gene Expression Omnibus (GEO) and pooled together through the federated infrastructure. Three previously published and two newly proposed 5-year cancer-free survival risk score classifiers were trained in a federated environment, and an additional reference classifier was trained with unconstrained data access. The performance of these six classifiers was systematically evaluated, and the results show that i) the published classifiers do not generalize well when applied to patient cohorts that differ from those used to develop them; ii) among the methods we tried, the classification using logistic regression worked better on average, closely followed by random forest; iii) the unconstrained version of the logistic regression classifier outperformed the federated version by 4% on average. Reproducibility of our experiments is ensured through the use of VisualSHIELD, an open-source tool that augments DataSHIELD with new functions, a standardized deployment procedure, and a simple graphical user interface.

Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD / Tomasoni, Danilo; Lombardo, Rosario; Lauria, Mario. - In: FRONTIERS IN GENETICS. - ISSN 1664-8021. - 15:(2024), pp. 12703871-12703878. [10.3389/fgene.2024.1270387]

Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD

Tomasoni, Danilo;Lombardo, Rosario;Lauria, Mario
2024-01-01

Abstract

Preserving data privacy is an important concern in the research use of patient data. The DataSHIELD suite enables privacy-aware advanced statistical analysis in a federated setting. Despite its many applications, it has a few open practical issues: the complexity of hosting a federated infrastructure, the performance penalty imposed by the privacy-preserving constraints, and the ease of use by non-technical users. In this work, we describe a case study in which we review different breast cancer classifiers and report our findings about the limits and advantages of such non-disclosive suite of tools in a realistic setting. Five independent gene expression datasets of breast cancer survival were downloaded from Gene Expression Omnibus (GEO) and pooled together through the federated infrastructure. Three previously published and two newly proposed 5-year cancer-free survival risk score classifiers were trained in a federated environment, and an additional reference classifier was trained with unconstrained data access. The performance of these six classifiers was systematically evaluated, and the results show that i) the published classifiers do not generalize well when applied to patient cohorts that differ from those used to develop them; ii) among the methods we tried, the classification using logistic regression worked better on average, closely followed by random forest; iii) the unconstrained version of the logistic regression classifier outperformed the federated version by 4% on average. Reproducibility of our experiments is ensured through the use of VisualSHIELD, an open-source tool that augments DataSHIELD with new functions, a standardized deployment procedure, and a simple graphical user interface.
2024
Tomasoni, Danilo; Lombardo, Rosario; Lauria, Mario
Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD / Tomasoni, Danilo; Lombardo, Rosario; Lauria, Mario. - In: FRONTIERS IN GENETICS. - ISSN 1664-8021. - 15:(2024), pp. 12703871-12703878. [10.3389/fgene.2024.1270387]
File in questo prodotto:
File Dimensione Formato  
fgene-15-1270387.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403870
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact