Machine learning is a promising research topic that has recently achieved remarkable results, leading to the substitution of more traditional methods with automatically learned solutions. Some recent works have begun to highlight how a machine-learned model can be tricked by just applying small variations to the data, resulting in completely erroneous outcomes. Such behaviour can be traced to two elements: the lack of any metrological characterization of the inputs passed to the model, such as the uncertainty of the data, and the lack of an assessment of the reliability of the results. This paper tackles both these elements, considering the case of random forest model and proposing a method for assessing a confidence probability as an estimator for classification reliability. This considers the original classification structure, leaving it untouched, and the distribution of the training datasets. An overlaying structure statistically combines the two, and also includes in the process the propagation of feature uncertainties as a further element deriving from input measurements. The new classification outcome is a vector of probabilities that define how reliably a feature entry can be assigned, or not, to each of the considered classes, independently of others. In this new structure, an additional classification result naturally becomes available: the unclassifiable feature entry.

Sigma-z random forest, classification and confidence / Fornaser, A.; De Cecco, M.; Bosetti, P.; Mizumoto, T.; Yasumoto, K.. - In: MEASUREMENT SCIENCE & TECHNOLOGY. - ISSN 0957-0233. - ELETTRONICO. - 2019, 30:2(2019), pp. 025002.1-025002.12. [10.1088/1361-6501/aaf466]

Sigma-z random forest, classification and confidence

Fornaser A.;De Cecco M.;Bosetti P.;Mizumoto T.;
2019-01-01

Abstract

Machine learning is a promising research topic that has recently achieved remarkable results, leading to the substitution of more traditional methods with automatically learned solutions. Some recent works have begun to highlight how a machine-learned model can be tricked by just applying small variations to the data, resulting in completely erroneous outcomes. Such behaviour can be traced to two elements: the lack of any metrological characterization of the inputs passed to the model, such as the uncertainty of the data, and the lack of an assessment of the reliability of the results. This paper tackles both these elements, considering the case of random forest model and proposing a method for assessing a confidence probability as an estimator for classification reliability. This considers the original classification structure, leaving it untouched, and the distribution of the training datasets. An overlaying structure statistically combines the two, and also includes in the process the propagation of feature uncertainties as a further element deriving from input measurements. The new classification outcome is a vector of probabilities that define how reliably a feature entry can be assigned, or not, to each of the considered classes, independently of others. In this new structure, an additional classification result naturally becomes available: the unclassifiable feature entry.
2019
2
Fornaser, A.; De Cecco, M.; Bosetti, P.; Mizumoto, T.; Yasumoto, K.
Sigma-z random forest, classification and confidence / Fornaser, A.; De Cecco, M.; Bosetti, P.; Mizumoto, T.; Yasumoto, K.. - In: MEASUREMENT SCIENCE & TECHNOLOGY. - ISSN 0957-0233. - ELETTRONICO. - 2019, 30:2(2019), pp. 025002.1-025002.12. [10.1088/1361-6501/aaf466]
File in questo prodotto:
File Dimensione Formato  
Fornaser_2019_Meas._Sci._Technol._30_025002.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.86 MB
Formato Adobe PDF
4.86 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/282239
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 8
  • OpenAlex ND
social impact