Multi-target Prediction Methods for Bioinformatics: Approaches for Protein Function Prediction and Candidate Discovery for Gene Regulatory Network Expansion

Masera, Luca

Biology is experiencing a paradigm shift since the advent of next generation sequencing technologies. The retrieved data largely exceeds the capability of biologists to investigate all possibilities in the laboratories, hence predictive tools able to guide the research are now a fundamental component of their workflow. Given the central role of proteins in living organisms, in this thesis we focus on their functional analysis and the intrinsic multi-target nature of this task. To this end, we propose different predictive methods, specifically developed to exploit side knowledge among target variables and examples. As a first contribution we face the task of protein-function prediction and more in general of hierarchical-multilabel classification (HMC). We present Ocelot a predictive pipeline for genome-wide protein characterization. It relies on a statistical-relational-learning tool, where the knowledge on the input examples is coded by the combination of multiple kernel matrices, while relations among target variables are expressed as logical constraints. Both, the mislabeling of examples and the infringement of logical rules are penalized by the loss function, but Ocelot do not forces hierarchical consistency. To overcome this limitation, we present AWX, a neural-networks output-layer that guarantees the formal consistency of HMC predictions. The second contribution is VSC, a binary classifier designed to incorporate the concepts of subsampling and locality in the definition of features to be used as the input of a perceptron. A locality-based confidence measure is used to weight the contribution of maximum-margin hyper-planes built by subsampling pairs of examples of opposite class. The rationale is that local methods can be exploited when a multi-target task is expected, but not reflected in the annotation space. The third and last contribution are NES2RA and OneGenE, two approaches for finding candidates to expand known gene regulatory networks. NES2RA adopts variable-subsetting strategies, enabled by volunteer distributed computing, and the PC algorithm to discover candidate causal relationships within each subset of variables. Then, ranking aggregators combine the partial results into a single ranked candidate genes list. OneGenE overcomes the main limitation of NES2RA, i.e. latency, by precomputing candidate expansion lists for each transcript of an organism that are then aggregated on-demand.

Multi-target Prediction Methods for Bioinformatics: Approaches for Protein Function Prediction and Candidate Discovery for Gene Regulatory Network Expansion / Masera, Luca. - (2019), pp. 1-133.

Multi-target Prediction Methods for Bioinformatics: Approaches for Protein Function Prediction and Candidate Discovery for Gene Regulatory Network Expansion

Masera, Luca

2019-01-01

Abstract

Biology is experiencing a paradigm shift since the advent of next generation sequencing technologies. The retrieved data largely exceeds the capability of biologists to investigate all possibilities in the laboratories, hence predictive tools able to guide the research are now a fundamental component of their workflow. Given the central role of proteins in living organisms, in this thesis we focus on their functional analysis and the intrinsic multi-target nature of this task. To this end, we propose different predictive methods, specifically developed to exploit side knowledge among target variables and examples. As a first contribution we face the task of protein-function prediction and more in general of hierarchical-multilabel classification (HMC). We present Ocelot a predictive pipeline for genome-wide protein characterization. It relies on a statistical-relational-learning tool, where the knowledge on the input examples is coded by the combination of multiple kernel matrices, while relations among target variables are expressed as logical constraints. Both, the mislabeling of examples and the infringement of logical rules are penalized by the loss function, but Ocelot do not forces hierarchical consistency. To overcome this limitation, we present AWX, a neural-networks output-layer that guarantees the formal consistency of HMC predictions. The second contribution is VSC, a binary classifier designed to incorporate the concepts of subsampling and locality in the definition of features to be used as the input of a perceptron. A locality-based confidence measure is used to weight the contribution of maximum-margin hyper-planes built by subsampling pairs of examples of opposite class. The rationale is that local methods can be exploited when a multi-target task is expected, but not reflected in the annotation space. The third and last contribution are NES2RA and OneGenE, two approaches for finding candidates to expand known gene regulatory networks. NES2RA adopts variable-subsetting strategies, enabled by volunteer distributed computing, and the PC algorithm to discover candidate causal relationships within each subset of variables. Then, ranking aggregators combine the partial results into a single ranked candidate genes list. OneGenE overcomes the main limitation of NES2RA, i.e. latency, by precomputing candidate expansion lists for each transcript of an organism that are then aggregated on-demand.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2019
			
	Ciclo
	
				XXXI
			
	Anno Accademico
	
				2019-2020
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Blanzieri, Enrico
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
Settore BIO/11 - Biologia Molecolare
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
Disclaimer_Masera.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 320.31 kB Formato Adobe PDF Visualizza/Apri	320.31 kB	Adobe PDF	Visualizza/Apri
PhD_Thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.27 MB Formato Adobe PDF Visualizza/Apri	3.27 MB	Adobe PDF	Visualizza/Apri