Unravelling the functioning of the complex processes involved in living systems is a challenging task. Enzymes are involved in almost all of the chemical processes taking place within the cell. They accelerate chemical reactions by forming a complex with the substrate and therefore lowering the reaction activation energy. The characterisation of the enzyme function at the molecular level is a fundamental step, which has several implications and applications in modern biotechnologies. This thesis investigates statistical and relational learning techniques for the characterisation of the enzyme function. The problem is tackled from two sides: the analysis of the enzyme structure and its interactions with other molecules, and the mining of relevant features from the enzyme mutation data. From the first side a pure statistical learning approach is proposed for directly predicting enzyme functional residues. This approach is shown to improve over the current state of the art on several benchmark datasets. The engineered predictors resulting from this investigation are now available to the public of researchers through the CatANalyst web server. Further improvement of the approach is pursued by proposing a supervised clustering technique for collectively predicting all the residues belonging to the same functional site. On the “learning from mutations†side, the focus shifts to the expressivity and interpretability of the learnt models. This thesis proposes novel statistical relational approaches for mining hierarchical features for multiple related tasks. The resistance of viral enzyme mutants to groups of related inhibitors is modelled in a multitask setting. Learnt models are refined on a group or per-task basis at different levels of the hierarchy. The proposed hierarchical approach is shown to provide statistically significant improvements over both single and multitask alternatives. Moreover it has the ability to provide explanation of the models which are themselves hierarchical. A task clustering approach is also proposed for inferring the structure of tasks when it is unknown. Finally, a relational approach is proposed for exploiting the learnt relational rules for generating novel mutations with specific characteristics. This allows to drastically reduce the space of possible mutations to be experimentally assessed. Promising preliminary results are obtained, which highlight the potential of the approach in guiding mutant engineering and in predicting the viral enzyme evolution. These findings can pave the way to further research directions in functional interpretation of biological data by means of machine learning techniques.

Statistical and Relational Learning for Understanding Enzyme Function / Cilia, Elisa. - (2010), pp. 1-229.

Statistical and Relational Learning for Understanding Enzyme Function

Cilia, Elisa
2010-01-01

Abstract

Unravelling the functioning of the complex processes involved in living systems is a challenging task. Enzymes are involved in almost all of the chemical processes taking place within the cell. They accelerate chemical reactions by forming a complex with the substrate and therefore lowering the reaction activation energy. The characterisation of the enzyme function at the molecular level is a fundamental step, which has several implications and applications in modern biotechnologies. This thesis investigates statistical and relational learning techniques for the characterisation of the enzyme function. The problem is tackled from two sides: the analysis of the enzyme structure and its interactions with other molecules, and the mining of relevant features from the enzyme mutation data. From the first side a pure statistical learning approach is proposed for directly predicting enzyme functional residues. This approach is shown to improve over the current state of the art on several benchmark datasets. The engineered predictors resulting from this investigation are now available to the public of researchers through the CatANalyst web server. Further improvement of the approach is pursued by proposing a supervised clustering technique for collectively predicting all the residues belonging to the same functional site. On the “learning from mutations†side, the focus shifts to the expressivity and interpretability of the learnt models. This thesis proposes novel statistical relational approaches for mining hierarchical features for multiple related tasks. The resistance of viral enzyme mutants to groups of related inhibitors is modelled in a multitask setting. Learnt models are refined on a group or per-task basis at different levels of the hierarchy. The proposed hierarchical approach is shown to provide statistically significant improvements over both single and multitask alternatives. Moreover it has the ability to provide explanation of the models which are themselves hierarchical. A task clustering approach is also proposed for inferring the structure of tasks when it is unknown. Finally, a relational approach is proposed for exploiting the learnt relational rules for generating novel mutations with specific characteristics. This allows to drastically reduce the space of possible mutations to be experimentally assessed. Promising preliminary results are obtained, which highlight the potential of the approach in guiding mutant engineering and in predicting the viral enzyme evolution. These findings can pave the way to further research directions in functional interpretation of biological data by means of machine learning techniques.
2010
XXII
2010-2011
Ingegneria e Scienza dell'Informaz (cess.4/11/12)
Information and Communication Technology
Passerini, Andrea
no
Inglese
Settore INF/01 - Informatica
Settore BIO/11 - Biologia Molecolare
File in questo prodotto:
File Dimensione Formato  
PhD-Thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 20.17 MB
Formato Adobe PDF
20.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368772
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact