Advancements of DNA sequencing technologies and improvement of analytic methods changed the way we analyze complex microbial communities (metagenomics). In only a few years, these methods have evolved so far as to ease a more precise community profiling and to allow high-level strain resolution. A typical computational metagenomic analysis relies on mapping raw DNA sequencing reads against sets of “reference” microbial genomes usually obtained through single-isolate sequencing. With an almost exponential increase in the number of reference genomes deposited daily in public data sets, current computational methods are incapable of managing and exploiting such a rich reference set, limiting the potential of metagenomic investigations.In my doctoral thesis, I will present my contribution towards fully exploiting the available reference data for metagenomic analysis. I developed ChocoPhlAn, an integrated pipeline for automatic retrieval, organization, and annotation of reference genomes and gene families as the foundation for bioBakery 3, an improved set of computational methods for the analysis of shotgun metagenomics data. Using the latest set of microbial genomic reference data available and processed through ChocoPhlAn, the six bioBakery 3 tools that I updated resulted in more comprehensive and higher resolution taxonomic and functional profiling of microbiomes and allowed strain-level characterization of their constituent strains. After extensive benchmarks with previous versions and competitors, we applied those methods on more than 10,000 real metagenomes and showed how metagenomics can be a more powerful tool for identifying novel links between the gut microbiome and disease conditions such as colorectal cancer and Inflammatory Bowel Disease. Accurate strain-level phylogeny reconstruction and pangenomic analysis of 7,783 metagenomes revealed novel functional, phylogenetic, and geographic diversity of Ruminococcus bromii, a common and highprevalent gut inhabitant. We then focused on the influence of the Eukaryotic fraction of the human microbiome and its potential impact on human gut health, which is a frequently overlooked aspect of microbial communities. To this end, we assessed the presence of the Eukaryotic parasite Blastocystis spp., in more than 2,000 metagenomes from 5 continents for understanding associations with disease statuses and geographic conditions. We showed that Blastocystis is the most common Eukaryotic colonizer of the human gut, and it is particularly prevalent in healthy subjects and non-westernized populations. We further explored intra-subtype diversity by reconstructing and functionally profiling new metagenomic-assembled Blastocystis genomes, showing how metagenomics can be valuable to unravel protists' genomics and providing a genomic resource for additional integration of non-bacterial taxa in metagenomic pipelines.9 By developing and implementing ChocoPhlAn and the new bioBakery tools, we provided the community with improved and efficient microbiome profiling tools and started identifying novel patterns of association between host and niche-associated microbiomes and discovering previously uncharacterized species from human and non-human hosts.

Integrative computational microbial genomics for large-scale metagenomic analyses / Beghini, Francesco. - (2021 Mar 30), pp. 1-198. [10.15168/11572_296396]

Integrative computational microbial genomics for large-scale metagenomic analyses

Beghini, Francesco
2021-03-30

Abstract

Advancements of DNA sequencing technologies and improvement of analytic methods changed the way we analyze complex microbial communities (metagenomics). In only a few years, these methods have evolved so far as to ease a more precise community profiling and to allow high-level strain resolution. A typical computational metagenomic analysis relies on mapping raw DNA sequencing reads against sets of “reference” microbial genomes usually obtained through single-isolate sequencing. With an almost exponential increase in the number of reference genomes deposited daily in public data sets, current computational methods are incapable of managing and exploiting such a rich reference set, limiting the potential of metagenomic investigations.In my doctoral thesis, I will present my contribution towards fully exploiting the available reference data for metagenomic analysis. I developed ChocoPhlAn, an integrated pipeline for automatic retrieval, organization, and annotation of reference genomes and gene families as the foundation for bioBakery 3, an improved set of computational methods for the analysis of shotgun metagenomics data. Using the latest set of microbial genomic reference data available and processed through ChocoPhlAn, the six bioBakery 3 tools that I updated resulted in more comprehensive and higher resolution taxonomic and functional profiling of microbiomes and allowed strain-level characterization of their constituent strains. After extensive benchmarks with previous versions and competitors, we applied those methods on more than 10,000 real metagenomes and showed how metagenomics can be a more powerful tool for identifying novel links between the gut microbiome and disease conditions such as colorectal cancer and Inflammatory Bowel Disease. Accurate strain-level phylogeny reconstruction and pangenomic analysis of 7,783 metagenomes revealed novel functional, phylogenetic, and geographic diversity of Ruminococcus bromii, a common and highprevalent gut inhabitant. We then focused on the influence of the Eukaryotic fraction of the human microbiome and its potential impact on human gut health, which is a frequently overlooked aspect of microbial communities. To this end, we assessed the presence of the Eukaryotic parasite Blastocystis spp., in more than 2,000 metagenomes from 5 continents for understanding associations with disease statuses and geographic conditions. We showed that Blastocystis is the most common Eukaryotic colonizer of the human gut, and it is particularly prevalent in healthy subjects and non-westernized populations. We further explored intra-subtype diversity by reconstructing and functionally profiling new metagenomic-assembled Blastocystis genomes, showing how metagenomics can be valuable to unravel protists' genomics and providing a genomic resource for additional integration of non-bacterial taxa in metagenomic pipelines.9 By developing and implementing ChocoPhlAn and the new bioBakery tools, we provided the community with improved and efficient microbiome profiling tools and started identifying novel patterns of association between host and niche-associated microbiomes and discovering previously uncharacterized species from human and non-human hosts.
30-mar-2021
XXXIII
2019-2020
CIBIO (29/10/12-)
Biomolecular Sciences
Segata, Nicola
no
Inglese
File in questo prodotto:
File Dimensione Formato  
Beghini_PhD_thesis_20210219_final_Redacted.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 8.18 MB
Formato Adobe PDF
8.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/296396
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact