Microarray is a high-throughput experimental technology which has been used in many life-science areas especially in medical applications. The sample classification problem is crucial for disease diagnosis and treatment. However, the process of sample labeling can be very complex and partially subjective. Existing studies confirm this phenomenon and show that even a very small number of error samples could deeply degrade the performance of the obtained classifier, particularly when the size of the dataset is small. More and more Microarray data have been collected by organizations or companies and can be used for further investigation, but the detection and correction of mislabeled samples remains hard to be done by hand. The problem we address in this paper is to develop a method for automatic detection of mislabeled samples and correction of the suspect samples. An algorithm for detecting and correcting potential error samples is proposed: Iterative-CLSWE. The algorithm is based on the classification stability of each sample in the whole dataset. The experimental results validate the proposed algorithm. This automatic way for detecting mislabeled and abnormal samples can prove to be significant for large collection of data coming from heterogeneous studies.

An Algorithm for Recognizing Mislabeled and Abnormal Samples in Cancer Microarray

Blanzieri, Enrico;Liang, Yanchun;
2011-01-01

Abstract

Microarray is a high-throughput experimental technology which has been used in many life-science areas especially in medical applications. The sample classification problem is crucial for disease diagnosis and treatment. However, the process of sample labeling can be very complex and partially subjective. Existing studies confirm this phenomenon and show that even a very small number of error samples could deeply degrade the performance of the obtained classifier, particularly when the size of the dataset is small. More and more Microarray data have been collected by organizations or companies and can be used for further investigation, but the detection and correction of mislabeled samples remains hard to be done by hand. The problem we address in this paper is to develop a method for automatic detection of mislabeled samples and correction of the suspect samples. An algorithm for detecting and correcting potential error samples is proposed: Iterative-CLSWE. The algorithm is based on the classification stability of each sample in the whole dataset. The experimental results validate the proposed algorithm. This automatic way for detecting mislabeled and abnormal samples can prove to be significant for large collection of data coming from heterogeneous studies.
2011
11
Y., Zhou; Blanzieri, Enrico; M., Zhang; Liang, Yanchun; X., Zhou
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/94754
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact