SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise.

Mirylenka, Katsiaryna; Giannakopoulos, George; Palpanas, Themistoklis

doi:10.1007/978-3-642-30217-6_10

Machine learning algorithms perform dierently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the \Sigmoid Rule" Framework focusing on the de- scription of classier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using ve dierent classiers, namely, Naive Bayes, kNN, SVM, a decision tree classier, and a rule-based classier. Our study leads to the denition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset aect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.