Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling

Chierici, Marco; Bussola, Nicole; Marcolini, Alessia; Francescato, Margherita; Zandona, Alessandro; Trastulla, Lucia; Agostinelli, Claudio; Jurman, Giuseppe; Furlanello, Cesare

doi:10.3389/fonc.2020.01065

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.

Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling / Chierici, Marco; Bussola, Nicole; Marcolini, Alessia; Francescato, Margherita; Zandona, Alessandro; Trastulla, Lucia; Agostinelli, Claudio; Jurman, Giuseppe; Furlanello, Cesare. - In: FRONTIERS IN ONCOLOGY. - ISSN 2234-943X. - 10:(2020), pp. 1065.1-1065.14. [10.3389/fonc.2020.01065]

Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling

Chierici, Marco;Bussola, Nicole;Marcolini, Alessia;Francescato, Margherita;Zandona, Alessandro;Trastulla, Lucia;Agostinelli, Claudio;Jurman, Giuseppe;Furlanello, Cesare

2020-01-01

Abstract

Recent technological advances and international efforts, such as The Cancer Genome Atlas (TCGA), have made available several pan-cancer datasets encompassing multiple omics layers with detailed clinical information in large collection of samples. The need has thus arisen for the development of computational methods aimed at improving cancer subtyping and biomarker identification from multi-modal data. Here we apply the Integrative Network Fusion (INF) pipeline, which combines multiple omics layers exploiting Similarity Network Fusion (SNF) within a machine learning predictive framework. INF includes a feature ranking scheme (rSNF) on SNF-integrated features, used by a classifier over juxtaposed multi-omics features (juXT). In particular, we show instances of INF implementing Random Forest (RF) and linear Support Vector Machine (LSVM) as the classifier, and two baseline RF and LSVM models are also trained on juXT. A compact RF model, called rSNFi, trained on the intersection of top-ranked biomarkers from the two approaches juXT and rSNF is finally derived. All the classifiers are run in a 10x5-fold cross-validation schema to warrant reproducibility, following the guidelines for an unbiased Data Analysis Plan by the US FDA-led initiatives MAQC/SEQC. INF is demonstrated on four classification tasks on three multi-modal TCGA oncogenomics datasets. Gene expression, protein expression and copy number variants are used to predict estrogen receptor status (BRCA-ER, N = 381) and breast invasive carcinoma subtypes (BRCA-subtypes, N = 305), while gene expression, miRNA expression and methylation data is used as predictor layers for acute myeloid leukemia and renal clear cell carcinoma survival (AML-OS, N = 157; KIRC-OS, N = 181). In test, INF achieved similar Matthews Correlation Coefficient (MCC) values and 97% to 83% smaller feature sizes (FS), compared with juXT for BRCA-ER (MCC: 0.83 vs. 0.80; FS: 56 vs. 1801) and BRCA-subtypes (0.84 vs. 0.80; 302 vs. 1801), improving KIRC-OS performance (0.38 vs. 0.31; 111 vs. 2319). INF predictions are generally more accurate in test than one-dimensional omics models, with smaller signatures too, where transcriptomics consistently play the leading role. Overall, the INF framework effectively integrates multiple data levels in oncogenomics classification tasks, improving over the performance of single layers alone and naive juxtaposition, and provides compact signature sizes.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Titolo del periodico (Journal title)
	
				FRONTIERS IN ONCOLOGY
			
	DOI
	
				https://dx.doi.org/10.3389/fonc.2020.01065
			
	Codice PubMed (PubMed Identifier)
	
				32714870
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85087926267
			
	Codice WOS (WOS identifier)
	
				WOS:000552941000001
			
	Tutti gli autori
	
						Chierici, Marco; Bussola, Nicole; Marcolini, Alessia; Francescato, Margherita; Zandona, Alessandro; Trastulla, Lucia; Agostinelli, Claudio; Jurman, Gi...espandi
						
	Citazione
	
				Integrative Network Fusion: A Multi-Omics Approach in Molecular Profiling / Chierici, Marco; Bussola, Nicole; Marcolini, Alessia; Francescato, Margherita; Zandona, Alessandro; Trastulla, Lucia; Agostinelli, Claudio; Jurman, Giuseppe; Furlanello, Cesare. - In: FRONTIERS IN ONCOLOGY. - ISSN 2234-943X. - 10:(2020), pp. 1065.1-1065.14. [10.3389/fonc.2020.01065]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
fonc-10-01065.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.52 MB Formato Adobe PDF Visualizza/Apri	1.52 MB	Adobe PDF	Visualizza/Apri