The influence of the inclusion of biological knowledge in statistical methods to integrate multi-omics data

IRIS

Understanding the relationships among biomolecules and how these relationships change between healthy and disease states is an important question in modern biology and medicine. The advances in high-throughput techniques has led to the explosion of biological data available for analysis, allowing researchers to investigate multiple molecular layers (i.e. omics data) together. The classical statistical methods could not address the challenges of combining multiple data types, leading to the development of ad hoc methodologies, which however depend on several factors. Among those, it is important to consider whether “prior knowledge” on the inter-omics relationships is available for integration. To address this issue, we thus focused on different approaches to perform three-omics integration: supervised (prior knowledge is available), unsupervised and semi-supervised. With the supervised integration of DNA methylation, gene expression and protein levels from adipocytes we observed coordinated significant changes across the three omics in the last phase of adipogenesis. However, in most cases, interactions between different molecular layers are complex and unknown: we explored unsupervised integration methods, showing that their results are influenced by method choice, pre-processing, number of integrated data types and experimental design. The strength of the inter-omics signal and the presence of noise are also proven as relevant factors. Since the inclusion of prior knowledge can highlight the former while decreasing the influence of the latter, we proposed a semi-supervised approach, showing that the inclusion of knowledge about inter-omics interactions increases the accuracy of unsupervised methods when solving the problem of sample classification.

The influence of the inclusion of biological knowledge in statistical methods to integrate multi-omics data / Tini, Giulia. - (2018), pp. 1-140.

The influence of the inclusion of biological knowledge in statistical methods to integrate multi-omics data

Tini, Giulia

2018-01-01

Abstract

Understanding the relationships among biomolecules and how these relationships change between healthy and disease states is an important question in modern biology and medicine. The advances in high-throughput techniques has led to the explosion of biological data available for analysis, allowing researchers to investigate multiple molecular layers (i.e. omics data) together. The classical statistical methods could not address the challenges of combining multiple data types, leading to the development of ad hoc methodologies, which however depend on several factors. Among those, it is important to consider whether “prior knowledge” on the inter-omics relationships is available for integration. To address this issue, we thus focused on different approaches to perform three-omics integration: supervised (prior knowledge is available), unsupervised and semi-supervised. With the supervised integration of DNA methylation, gene expression and protein levels from adipocytes we observed coordinated significant changes across the three omics in the last phase of adipogenesis. However, in most cases, interactions between different molecular layers are complex and unknown: we explored unsupervised integration methods, showing that their results are influenced by method choice, pre-processing, number of integrated data types and experimental design. The strength of the inter-omics signal and the presence of noise are also proven as relevant factors. Since the inclusion of prior knowledge can highlight the former while decreasing the influence of the latter, we proposed a semi-supervised approach, showing that the inclusion of knowledge about inter-omics interactions increases the accuracy of unsupervised methods when solving the problem of sample classification.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2018
			
	Ciclo
	
				XXX
			
	Anno Accademico
	
				2018-2019
			
	Dipartimento
	
				Facoltà di Giurisprudenza (29/10/12-)
			
	Corso di dottorato
	
				Mathematics
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Marchetti, Luca
Priami, Corrado
Scott-Boyer, Marie Pier
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (SSD) (validi fino a 24/06/2024)
	
				Settore BIO/13 - Biologia Applicata
Settore MAT/06 - Probabilita' e Statistica Matematica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
disclaimer.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.47 MB Formato Adobe PDF Visualizza/Apri	1.47 MB	Adobe PDF	Visualizza/Apri
thesis.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 6.36 MB Formato Adobe PDF Visualizza/Apri	6.36 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/367748

Citazioni

ND

ND

ND

social impact