The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of resolution and relevance, to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.

Information-theoretical measures identify accurate low-resolution representations of protein configurational space / Mele, M.; Covino, R.; Potestio, R.. - In: SOFT MATTER. - ISSN 1744-683X. - 18:37(2022), pp. 7064-7074. [10.1039/d2sm00636g]

Information-theoretical measures identify accurate low-resolution representations of protein configurational space

Mele M.;Covino R.;Potestio R.
2022-01-01

Abstract

The steadily growing computational power employed to perform molecular dynamics simulations of biological macromolecules represents at the same time an immense opportunity and a formidable challenge. In fact, large amounts of data are produced, from which useful, synthetic, and intelligible information has to be extracted to make the crucial step from knowing to understanding. Here we tackled the problem of coarsening the conformational space sampled by proteins in the course of molecular dynamics simulations. We applied different schemes to cluster the frames of a dataset of protein simulations; we then employed an information-theoretical framework, based on the notion of resolution and relevance, to gauge how well the various clustering methods accomplish this simplification of the configurational space. Our approach allowed us to identify the level of resolution that optimally balances simplicity and informativeness; furthermore, we found that the most physically accurate clustering procedures are those that induce an ultrametric structure of the low-resolution space, consistently with the hypothesis that the protein conformational landscape has a self-similar organisation. The proposed strategy is general and its applicability extends beyond that of computational biophysics, making it a valuable tool to extract useful information from large datasets.
2022
37
Mele, M.; Covino, R.; Potestio, R.
Information-theoretical measures identify accurate low-resolution representations of protein configurational space / Mele, M.; Covino, R.; Potestio, R.. - In: SOFT MATTER. - ISSN 1744-683X. - 18:37(2022), pp. 7064-7074. [10.1039/d2sm00636g]
File in questo prodotto:
File Dimensione Formato  
ERC_2022_Mele_SoftMatter.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.38 MB
Formato Adobe PDF
1.38 MB Adobe PDF Visualizza/Apri
d2sm00636g.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.41 MB
Formato Adobe PDF
1.41 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/360131
Citazioni
  • ???jsp.display-item.citation.pmc??? 0
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact