Microarray is a high-throughput experimental technology which has been used in many life-science areas especially in medical applications. The sample classification problem is crucial for disease diagnosis and treatment. However, the process of sample labeling can be very complex and partially subjective. Existing studies confirm this phenomenon and show that even a very small number of error samples could deeply degrade the performance of the obtained classifier, particularly when the size of the dataset is small. More and more Microarray data have been collected by organizations or companies and can be used for further investigation, but the detection and correction of mislabeled samples remains hard to be done by hand. The problem we address in this paper is to develop a method for automatic detection of mislabeled samples and correction of the suspect samples. An algorithm for detecting and correcting potential error samples is proposed: Iterative-CLSWE. The algorithm is based on the classification stability of each sample in the whole dataset. The experimental results validate the proposed algorithm. This automatic way for detecting mislabeled and abnormal samples can prove to be significant for large collection of data coming from heterogeneous studies.
An Algorithm for Recognizing Mislabeled and Abnormal Samples in Cancer Microarray
Blanzieri, Enrico;Liang, Yanchun;
2011-01-01
Abstract
Microarray is a high-throughput experimental technology which has been used in many life-science areas especially in medical applications. The sample classification problem is crucial for disease diagnosis and treatment. However, the process of sample labeling can be very complex and partially subjective. Existing studies confirm this phenomenon and show that even a very small number of error samples could deeply degrade the performance of the obtained classifier, particularly when the size of the dataset is small. More and more Microarray data have been collected by organizations or companies and can be used for further investigation, but the detection and correction of mislabeled samples remains hard to be done by hand. The problem we address in this paper is to develop a method for automatic detection of mislabeled samples and correction of the suspect samples. An algorithm for detecting and correcting potential error samples is proposed: Iterative-CLSWE. The algorithm is based on the classification stability of each sample in the whole dataset. The experimental results validate the proposed algorithm. This automatic way for detecting mislabeled and abnormal samples can prove to be significant for large collection of data coming from heterogeneous studies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione