This work aims at developing a novel machine learning method to investigate heterogeneity in neurodevelopmental disorders, with a focus on autism spectrum conditions (ASCs). In ASCs, heterogeneity is shown at several levels of analysis, e.g., genetic, behavioral, throughout developmental trajectories, which hinders the development of effective treatments and the identification of biological pathways involved in gene-cognition-behavior links. ASC diagnosis comes from behavioral observations, which determine the cohort composition of studies in every scientific field (e.g., psychology, neuroscience, genetics). Thus, uncovering behavioral subtypes can provide stratified ASC cohorts that are more representative of the true population. Ideally, behavioral stratification can (1) help to revise and shorten the diagnostic process highlighting the characteristics that best identify heterogeneity; (2) help to develop personalized treatments based on their effectiveness for subgroups of subjects; (3) investigate how the longitudinal course of the condition might differ (e.g., divergent/convergent developmental trajectories); (4) contribute to the identification of genetic variants that may be overlooked in case-control studies; and (5) identify possible disrupted neuronal activity in the brain (e.g., excitatory/inhibitory mechanisms). The characterization of the temporal aspects of heterogeneous manifestations based on their multi-dimensional features is thus the key to identify the etiology of such disorders and establish personalized treatments. Features include trajectories described by a multi-modal combination of electronic health records (EHRs), cognitive functioning and adaptive behavior indicators. This thesis contributes in particular to a data-driven discovery of clinical and behavioral trajectories of individuals with complex disorders and ASCs. Machine learning techniques, such as deep learning and word embedding, that proved successful for e.g., natural language processing and image classification, are gaining ground in healthcare research for precision medicine. Here, we leverage these methods to investigate the feasibility of learning data-driven pathways that have been difficult to identify in the clinical practice to help disentangle the complexity of conditions whose etiology is still unknown. In Chapter 1, we present a new computational method, based on deep learning, to stratify patients with complex disorders; we demonstrate the method on multiple myeloma, Alzheimer’s disease, and Parkinson’s disease, among others. We use clinical records from a heterogeneous patient cohort (i.e., multiple disease dataset) of 1.6M temporally-ordered EHR sequences from the Mount Sinai health system’s data warehouse to learn unsupervised patient representations. These representations are then leveraged to identify subgroups within complex condition cohorts via hierarchical clustering. We investigate the enrichment of terms that code for comorbidities, medications, laboratory tests and procedures, to clinically validate our results. A data analysis protocol is developed in Chapter 2 that produces behavioral embeddings from observational measurements to represent subjects with ASCs in a latent space able to capture multiple levels of assessment (i.e., multiple tests) and the temporal pattern of behavioral-cognitive profiles. The computational framework includes clustering algorithms and state-of-the-art word and text representation methods originally developed for natural language processing. The aim is to detect subgroups within ASC cohorts towards the identification of possible subtypes based on behavioral, cognitive, and functioning aspects. The protocol is applied to ASC behavioral data of 204 children and adolescents referred to the Laboratory of Observation Diagnosis and Education (ODFLab) at the University of Trento. In Chapter 3 we develop a case study for ASCs. From the learned representations of Chapter 1, we select 1,439 individuals with ASCs and investigate whether such representations generalize well to any disorder. Specifically, we identify three subgroups within individuals with ASCs that are further clinically validated to detect clinical profiles based on different term enrichment that can inform comorbidities, therapeutic treatments, medication side effects, and screening policies. This work has been developed in partnership with ODFLab (University of Trento) and the Predictive Models for Biomedicine and Environment unit at FBK. The study reported in Chapter 1 has been conducted at the Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai (NY).

Stratification of autism spectrum conditions by deep encodings / Landi, Isotta. - (2020 Feb 13), pp. 1-126. [10.15168/11572_252684]

Stratification of autism spectrum conditions by deep encodings

Landi, Isotta
2020-02-13

Abstract

This work aims at developing a novel machine learning method to investigate heterogeneity in neurodevelopmental disorders, with a focus on autism spectrum conditions (ASCs). In ASCs, heterogeneity is shown at several levels of analysis, e.g., genetic, behavioral, throughout developmental trajectories, which hinders the development of effective treatments and the identification of biological pathways involved in gene-cognition-behavior links. ASC diagnosis comes from behavioral observations, which determine the cohort composition of studies in every scientific field (e.g., psychology, neuroscience, genetics). Thus, uncovering behavioral subtypes can provide stratified ASC cohorts that are more representative of the true population. Ideally, behavioral stratification can (1) help to revise and shorten the diagnostic process highlighting the characteristics that best identify heterogeneity; (2) help to develop personalized treatments based on their effectiveness for subgroups of subjects; (3) investigate how the longitudinal course of the condition might differ (e.g., divergent/convergent developmental trajectories); (4) contribute to the identification of genetic variants that may be overlooked in case-control studies; and (5) identify possible disrupted neuronal activity in the brain (e.g., excitatory/inhibitory mechanisms). The characterization of the temporal aspects of heterogeneous manifestations based on their multi-dimensional features is thus the key to identify the etiology of such disorders and establish personalized treatments. Features include trajectories described by a multi-modal combination of electronic health records (EHRs), cognitive functioning and adaptive behavior indicators. This thesis contributes in particular to a data-driven discovery of clinical and behavioral trajectories of individuals with complex disorders and ASCs. Machine learning techniques, such as deep learning and word embedding, that proved successful for e.g., natural language processing and image classification, are gaining ground in healthcare research for precision medicine. Here, we leverage these methods to investigate the feasibility of learning data-driven pathways that have been difficult to identify in the clinical practice to help disentangle the complexity of conditions whose etiology is still unknown. In Chapter 1, we present a new computational method, based on deep learning, to stratify patients with complex disorders; we demonstrate the method on multiple myeloma, Alzheimer’s disease, and Parkinson’s disease, among others. We use clinical records from a heterogeneous patient cohort (i.e., multiple disease dataset) of 1.6M temporally-ordered EHR sequences from the Mount Sinai health system’s data warehouse to learn unsupervised patient representations. These representations are then leveraged to identify subgroups within complex condition cohorts via hierarchical clustering. We investigate the enrichment of terms that code for comorbidities, medications, laboratory tests and procedures, to clinically validate our results. A data analysis protocol is developed in Chapter 2 that produces behavioral embeddings from observational measurements to represent subjects with ASCs in a latent space able to capture multiple levels of assessment (i.e., multiple tests) and the temporal pattern of behavioral-cognitive profiles. The computational framework includes clustering algorithms and state-of-the-art word and text representation methods originally developed for natural language processing. The aim is to detect subgroups within ASC cohorts towards the identification of possible subtypes based on behavioral, cognitive, and functioning aspects. The protocol is applied to ASC behavioral data of 204 children and adolescents referred to the Laboratory of Observation Diagnosis and Education (ODFLab) at the University of Trento. In Chapter 3 we develop a case study for ASCs. From the learned representations of Chapter 1, we select 1,439 individuals with ASCs and investigate whether such representations generalize well to any disorder. Specifically, we identify three subgroups within individuals with ASCs that are further clinically validated to detect clinical profiles based on different term enrichment that can inform comorbidities, therapeutic treatments, medication side effects, and screening policies. This work has been developed in partnership with ODFLab (University of Trento) and the Predictive Models for Biomedicine and Environment unit at FBK. The study reported in Chapter 1 has been conducted at the Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai (NY).
13-feb-2020
XXXII
2018-2019
Psicologia e scienze cognitive (29/10/12-)
Psychological Sciences and Education
Venuti, Paola
Furlanello, Cesare
no
Inglese
File in questo prodotto:
File Dimensione Formato  
phd_unitn_Isotta_Landi.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 11.95 MB
Formato Adobe PDF
11.95 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/252684
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact