Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas. Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers’ performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA. Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student’s t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.

Statistical analysis of a Bayesian classifier based on the expression of miRNAs

Ricci, Leonardo;Del Vescovo, Valerio;Grasso, Margherita;Barbareschi, Mattia;Denti, Michela Alessandra
2015-01-01

Abstract

Background: During the last decade, many scientific works have concerned the possible use of miRNA levels as diagnostic and prognostic tools for different kinds of cancer. The development of reliable classifiers requires tackling several crucial aspects, some of which have been widely overlooked in the scientific literature: the distribution of the measured miRNA expressions and the statistical uncertainty that affects the parameters that characterize a classifier. In this paper, these topics are analysed in detail by discussing a model problem, i.e. the development of a Bayesian classifier that, on the basis of the expression of miR-205, miR-21 and snRNA U6, discriminates samples into two classes of pulmonary tumors: adenocarcinomas and squamous cell carcinomas. Results: We proved that the variance of miRNA expression triplicates is well described by a normal distribution and that triplicate averages also follow normal distributions. We provide a method to enhance a classifiers’ performance by exploiting the correlations between the class-discriminating miRNA and the expression of an additional normalized miRNA. Conclusions: By exploiting the normal behavior of triplicate variances and averages, invalid samples (outliers) can be identified by checking their variability via chi-square test or their displacement by the respective population mean via Student’s t-test. Finally, the normal behavior allows to optimally set the Bayesian classifier and to determine its performance and the related uncertainty.
2015
1
Ricci, Leonardo; Del Vescovo, Valerio; Cantaloni, Chiara; Grasso, Margherita; Barbareschi, Mattia; Denti, Michela Alessandra
File in questo prodotto:
File Dimensione Formato  
BMC_Bioinformatics_2015_16_287.pdf

accesso aperto

Descrizione: Articolo
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 1.63 MB
Formato Adobe PDF
1.63 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/110487
Citazioni
  • ???jsp.display-item.citation.pmc??? 2
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact