Machine learning algorithms perform dierently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the \Sigmoid Rule" Framework focusing on the de- scription of classier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using ve dierent classiers, namely, Naive Bayes, kNN, SVM, a decision tree classier, and a rule-based classier. Our study leads to the denition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset aect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.

SRF: A Framework for the Study of Classifier Behavior under Training Set Mislabeling Noise.

Mirylenka, Katsiaryna;Giannakopoulos, George;Palpanas, Themistoklis
2012-01-01

Abstract

Machine learning algorithms perform dierently in settings with varying levels of training set mislabeling noise. Therefore, the choice of a good algorithm for a particular learning problem is crucial. In this paper, we introduce the \Sigmoid Rule" Framework focusing on the de- scription of classier behavior in noisy settings. The framework uses an existing model of the expected performance of learning algorithms as a sigmoid function of the signal-to-noise ratio in the training instances. We study the parameters of the above sigmoid function using ve dierent classiers, namely, Naive Bayes, kNN, SVM, a decision tree classier, and a rule-based classier. Our study leads to the denition of intuitive criteria based on the sigmoid parameters that can be used to compare the behavior of learning algorithms in the presence of varying levels of noise. Furthermore, we show that there exists a connection between these parameters and the characteristics of the underlying dataset, hinting at how the inherent properties of a dataset aect learning. The framework is applicable to concept drift scenaria, including modeling user behavior over time, and mining of noisy data series, as in sensor networks.
2012
Proceedings of the International Working Conference on Advanced Visual Interfa
AA. VV.
Berlin
Springer
9783642302169
9783642302176
Mirylenka, Katsiaryna; Giannakopoulos, George; Palpanas, Themistoklis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/91998
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact