Metagenomic sequencing has revolutionized gut microbiome research by providing comprehensive access to the entire genomic content of any biological sample, namely a metagenome. Thanks to the possibility of studying microbial ecosystems in-depth without requiring direct isolation or cultivation of their members, metagenomics has greatly expanded knowledge on the taxonomic and functional diversity of the human gut microbiome and how deeply it is involved in human physiology. Metagenomic assembly is a computational technique that enables the reconstruction of bacterial genomes, known as metagenome-assembled genomes (MAGs). Systematically recovering MAGs from gut metagenomes has allowed researchers to progressively unfold the complexity of the microbiome-host system by cataloging and characterizing the genomes of thousands of previously unknown bacterial lineages that comprise it. Despite its importance, this task faces computational limitations that complicate the recovery of microbial diversity associated with rare and low-abundance species, popularly known as the 'microbial dark matter'. Consequently, optimizing available metagenomic data to maximize observable diversity and genome reconstruction is crucial for comprehensive microbiome analysis. In this doctoral thesis, I explore how the concurrent processing of multiple biologically similar metagenomes, when available, using reference- and assembly-based approaches can help in the identification of previously undetected bacterial species. More specifically, I performed metagenomic (co)assembly and (co)binning and applied it to a cohort of ultra-deep, redundantly sequenced gut metagenomes from a small number of individuals. I demonstrate that the careful application of this approach allows for the recovery of high-quality MAGs from novel and under-characterized bacterial species that would otherwise be missed with a single sample. This allowed for the reconstruction of genomes from 198 species lacking reference genomes and 39 completely novel microbial species from gut communities that should already be well represented, highlighting how a significant amount of phylogenetic diversity has remained hidden primarily due to the low sequencing depth of most studies, rather than an insufficient number of sampled individuals. Although multi-sample approaches have been applied in numerous studies for the aforementioned reasons, this work outlines the ideal conditions to apply them in cross-sectional and longitudinal contexts to minimize the occurrence of assembly errors. I show that (co)assembly is most effective with samples from the same subject, as combinations of samples from unrelated subjects generates strain-chimeric MAGs that do not represent actual strains populations. In parallel, I also provide estimates of the sequencing requirements needed to capture this diversity by complementing (co)assembly with reference-based methods. The findings in this thesis advance our understanding of metagenomic assembly techniques and highlight the importance of optimizing data usage in microbiome studies. The recovery of high-quality MAGs empowers various applications, from surveying unknown species to guiding their experimental isolation and characterization. Furthermore, integrating these MAGs into reference-based approaches enables large-scale screening to draw associations with host-related variables, ultimately contributing to a more comprehensive understanding of the gut microbiome.
Exploiting the potential of metagenomics to uncover novel and uncharacterized gut microbiome diversity / Golzato, Davide. - (2024 Dec 16), pp. -118.
Exploiting the potential of metagenomics to uncover novel and uncharacterized gut microbiome diversity
Golzato, Davide
2024-12-16
Abstract
Metagenomic sequencing has revolutionized gut microbiome research by providing comprehensive access to the entire genomic content of any biological sample, namely a metagenome. Thanks to the possibility of studying microbial ecosystems in-depth without requiring direct isolation or cultivation of their members, metagenomics has greatly expanded knowledge on the taxonomic and functional diversity of the human gut microbiome and how deeply it is involved in human physiology. Metagenomic assembly is a computational technique that enables the reconstruction of bacterial genomes, known as metagenome-assembled genomes (MAGs). Systematically recovering MAGs from gut metagenomes has allowed researchers to progressively unfold the complexity of the microbiome-host system by cataloging and characterizing the genomes of thousands of previously unknown bacterial lineages that comprise it. Despite its importance, this task faces computational limitations that complicate the recovery of microbial diversity associated with rare and low-abundance species, popularly known as the 'microbial dark matter'. Consequently, optimizing available metagenomic data to maximize observable diversity and genome reconstruction is crucial for comprehensive microbiome analysis. In this doctoral thesis, I explore how the concurrent processing of multiple biologically similar metagenomes, when available, using reference- and assembly-based approaches can help in the identification of previously undetected bacterial species. More specifically, I performed metagenomic (co)assembly and (co)binning and applied it to a cohort of ultra-deep, redundantly sequenced gut metagenomes from a small number of individuals. I demonstrate that the careful application of this approach allows for the recovery of high-quality MAGs from novel and under-characterized bacterial species that would otherwise be missed with a single sample. This allowed for the reconstruction of genomes from 198 species lacking reference genomes and 39 completely novel microbial species from gut communities that should already be well represented, highlighting how a significant amount of phylogenetic diversity has remained hidden primarily due to the low sequencing depth of most studies, rather than an insufficient number of sampled individuals. Although multi-sample approaches have been applied in numerous studies for the aforementioned reasons, this work outlines the ideal conditions to apply them in cross-sectional and longitudinal contexts to minimize the occurrence of assembly errors. I show that (co)assembly is most effective with samples from the same subject, as combinations of samples from unrelated subjects generates strain-chimeric MAGs that do not represent actual strains populations. In parallel, I also provide estimates of the sequencing requirements needed to capture this diversity by complementing (co)assembly with reference-based methods. The findings in this thesis advance our understanding of metagenomic assembly techniques and highlight the importance of optimizing data usage in microbiome studies. The recovery of high-quality MAGs empowers various applications, from surveying unknown species to guiding their experimental isolation and characterization. Furthermore, integrating these MAGs into reference-based approaches enables large-scale screening to draw associations with host-related variables, ultimately contributing to a more comprehensive understanding of the gut microbiome.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione