Over the past decade, advances in single-cell and spatial transcriptomics have transformed our understanding of cellular heterogeneity and tissue organization, providing unprecedented insights into cell processes in health and disease. However, these technologies have introduced a new set of analytical challenges, with the reliable identification and annotation of cell types and states being a major bottleneck. An effective way to achieve accurate cell type identification is by employing marker genes, whose expression is specific to one or few cell types. Despite the growing number of databases collecting markers, existing tools and resources often rely on inconsistent marker sets, with lack of standard classification, which leads to discordant annotations and limited biological interpretability. Moreover, most current resources are biased toward physiological cell types, reducing their relevance in disease contexts. To address these limitations, I developed the Cell Marker Accordion, a user-friendly platform that enables automatic, robust and interpretable annotation of single-cell populations using consistency-weighted markers. Validation on multiple single-cell and spatial transcriptomics datasets from human and mouse samples resulted in consistently improved annotation accuracy. Importantly, the Cell Marker Accordion can also identify disease-critical cell populations and uncover potential biomarkers across diverse pathological conditions, including liquid and solid tumors. To enhance the accessibility and extend the applicability of the Cell Marker Accordion, a Python implementation has also been developed, including additional analytical features to improve interpretability. In particular, entropy-based metrics were incorporated to estimate uncertainty and heterogeneity in cell type assignments. Furthermore, a deconvolution module was implemented to address a key limitation of low-resolution spatial transcriptomics, where each spot often contains multiple cells. This functionality estimates the cellular composition within each spot, revealing intra-spot heterogeneity and offering insights into cell type distributions in low-resolution spatial data. To investigate dysregulated cellular processes underlying pathogenesis, diagnosis, disease progression or therapy resistance, I developed a customizable module within the Cell Marker Accordion that enables rapid, interpretable exploration of altered pathways in pathologic conditions, based on weighted markers. This functionality further facilitates the discovery of functionally relevant biomarkers and potential therapeutic targets. This module has been used to investigate dysregulation of stress granules, membrane-less organelles that form under cellular stress, with the aim of uncovering potential associations between their alteration and cancer. First, to characterize the RNA landscape of stress granules, I performed a meta-analysis of 26 published transcriptomic datasets, identifying low-, medium-, and high-consensus RNAs associated with stress granules. Then, using the Cell Marker Accordion custom module and the weighted list of stress granule–associated RNAs, I identified a significant increase of stress granule components in acute myeloid leukemia patients. Single-cell analysis revealed a specific association between leukemic hematopoietic stem cells and stress granule activity, compared to other leukemic cells, indicating that stress granule dysregulation may contribute to their competitive advantage under stress conditions. Overall, this thesis introduces fast, customizable, and robust computational frameworks that enable annotation and interpretation of single-cell and spatial omics data, providing powerful tools that have been used to dissect cellular complexity across a wide range of physiological and pathological conditions.
Unraveling cellular complexity: novel computational frameworks to improve the interpretability of single-cell and spatial omics data in health and disease
Busarello, Emma
2025-12-18
Abstract
Over the past decade, advances in single-cell and spatial transcriptomics have transformed our understanding of cellular heterogeneity and tissue organization, providing unprecedented insights into cell processes in health and disease. However, these technologies have introduced a new set of analytical challenges, with the reliable identification and annotation of cell types and states being a major bottleneck. An effective way to achieve accurate cell type identification is by employing marker genes, whose expression is specific to one or few cell types. Despite the growing number of databases collecting markers, existing tools and resources often rely on inconsistent marker sets, with lack of standard classification, which leads to discordant annotations and limited biological interpretability. Moreover, most current resources are biased toward physiological cell types, reducing their relevance in disease contexts. To address these limitations, I developed the Cell Marker Accordion, a user-friendly platform that enables automatic, robust and interpretable annotation of single-cell populations using consistency-weighted markers. Validation on multiple single-cell and spatial transcriptomics datasets from human and mouse samples resulted in consistently improved annotation accuracy. Importantly, the Cell Marker Accordion can also identify disease-critical cell populations and uncover potential biomarkers across diverse pathological conditions, including liquid and solid tumors. To enhance the accessibility and extend the applicability of the Cell Marker Accordion, a Python implementation has also been developed, including additional analytical features to improve interpretability. In particular, entropy-based metrics were incorporated to estimate uncertainty and heterogeneity in cell type assignments. Furthermore, a deconvolution module was implemented to address a key limitation of low-resolution spatial transcriptomics, where each spot often contains multiple cells. This functionality estimates the cellular composition within each spot, revealing intra-spot heterogeneity and offering insights into cell type distributions in low-resolution spatial data. To investigate dysregulated cellular processes underlying pathogenesis, diagnosis, disease progression or therapy resistance, I developed a customizable module within the Cell Marker Accordion that enables rapid, interpretable exploration of altered pathways in pathologic conditions, based on weighted markers. This functionality further facilitates the discovery of functionally relevant biomarkers and potential therapeutic targets. This module has been used to investigate dysregulation of stress granules, membrane-less organelles that form under cellular stress, with the aim of uncovering potential associations between their alteration and cancer. First, to characterize the RNA landscape of stress granules, I performed a meta-analysis of 26 published transcriptomic datasets, identifying low-, medium-, and high-consensus RNAs associated with stress granules. Then, using the Cell Marker Accordion custom module and the weighted list of stress granule–associated RNAs, I identified a significant increase of stress granule components in acute myeloid leukemia patients. Single-cell analysis revealed a specific association between leukemic hematopoietic stem cells and stress granule activity, compared to other leukemic cells, indicating that stress granule dysregulation may contribute to their competitive advantage under stress conditions. Overall, this thesis introduces fast, customizable, and robust computational frameworks that enable annotation and interpretation of single-cell and spatial omics data, providing powerful tools that have been used to dissect cellular complexity across a wide range of physiological and pathological conditions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



