Deciphering the functional effects of genetic variants, especially those inherited together on the same haplotype, remains a major challenge in human genetics, where epistasis among co-occurring variants can further complicate interpretation. To address this, we present HapScoreDB, a database offering protein language model-derived scores for haplotype-resolved protein-coding sequences across all human transcript isoforms. Leveraging GENCODE and Ensembl annotations with phased variant data from the 1000 Genomes Project, HapScoreDB includes over 130000 distinct protein haplotypes from >18000 genes and 78000 transcripts, encompassing over 94000 coding variants. Fitness scores for each haplotype were computed using state-of-the-art protein language models. Preliminary analyses show that haplotypes harboring cancer GWAS variants tend to have significantly reduced predicted fitness. Moreover, variability in scores across haplotypes of the same transcript highlights known cancer genes, suggesting that dispersion in predicted fitness may capture functionally important variation. HapScoreDB features a user-friendly web interface for interactive exploration, visualization, and download of both full and customized datasets. As a dynamic and expandable platform, it connects real-world human genetic variation with advanced protein modeling, enabling novel approaches in variant interpretation, isoform prioritization, and population-scale functional genomics. Access HapScoreDB at https://bcglab.cibio.unitn.it/hapscoredb.

HapScoreDB: a database of protein language model functional scores for haplotype-resolved protein sequences / Mazza, Fabio; Gastaldello, Filippo; Dalfovo, Davide; Lattanzi, Gianluca; Romanel, Alessandro. - In: NUCLEIC ACIDS RESEARCH. - ISSN 1362-4962. - 54:D1(2026), pp. D1087-D1097. [10.1093/nar/gkaf1184]

HapScoreDB: a database of protein language model functional scores for haplotype-resolved protein sequences

Mazza, Fabio;Gastaldello, Filippo;Dalfovo, Davide;Lattanzi, Gianluca;Romanel, Alessandro
2026-01-01

Abstract

Deciphering the functional effects of genetic variants, especially those inherited together on the same haplotype, remains a major challenge in human genetics, where epistasis among co-occurring variants can further complicate interpretation. To address this, we present HapScoreDB, a database offering protein language model-derived scores for haplotype-resolved protein-coding sequences across all human transcript isoforms. Leveraging GENCODE and Ensembl annotations with phased variant data from the 1000 Genomes Project, HapScoreDB includes over 130000 distinct protein haplotypes from >18000 genes and 78000 transcripts, encompassing over 94000 coding variants. Fitness scores for each haplotype were computed using state-of-the-art protein language models. Preliminary analyses show that haplotypes harboring cancer GWAS variants tend to have significantly reduced predicted fitness. Moreover, variability in scores across haplotypes of the same transcript highlights known cancer genes, suggesting that dispersion in predicted fitness may capture functionally important variation. HapScoreDB features a user-friendly web interface for interactive exploration, visualization, and download of both full and customized datasets. As a dynamic and expandable platform, it connects real-world human genetic variation with advanced protein modeling, enabling novel approaches in variant interpretation, isoform prioritization, and population-scale functional genomics. Access HapScoreDB at https://bcglab.cibio.unitn.it/hapscoredb.
2026
D1
Mazza, Fabio; Gastaldello, Filippo; Dalfovo, Davide; Lattanzi, Gianluca; Romanel, Alessandro
HapScoreDB: a database of protein language model functional scores for haplotype-resolved protein sequences / Mazza, Fabio; Gastaldello, Filippo; Dalfovo, Davide; Lattanzi, Gianluca; Romanel, Alessandro. - In: NUCLEIC ACIDS RESEARCH. - ISSN 1362-4962. - 54:D1(2026), pp. D1087-D1097. [10.1093/nar/gkaf1184]
File in questo prodotto:
File Dimensione Formato  
gkaf1184.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.63 MB
Formato Adobe PDF
1.63 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/479550
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact