In machine learning, an information-theory optimal way to filter the best input features, without reference to any specific machine learning models, consists of maximizing the mutual information between the selected features and the model output, a choice which will minimize the uncertainty in the output to be predicted, given the feature values. Although this criterion is optimal in the context of information theory, a practical difficulty in using it lies in the need to estimate the mutual information from a limited set of input-output examples, in possibly very-high-dimensional input spaces. Estimating probability densities from some data points in these conditions is far from trivial. Starting from the seminal proposals in [1], different approaches focus on approximating the mutual information by considering a limited set of variable dependencies (like dependencies among couples or triplets), or by assuming specific forms for the probability densities (like Gaussian forms). In this paper we study the effect of considering the exact mutual information between selected features and output, without resorting to any approximation (apart from that implicit and unavoidable in estimating it from experimental data). The objectives of this investigation are: to assess how far one can go by adopting the exact mutual information in terms of CPU time and number of features, and to measure what is lost by adopting some popular approximations which consider only relationships among small subsets of features, assumptions about the distribution of feature values (e.g. Gaussian) or upper bounds on the mutual information as proxies to maximize instead of the exact value. The experimental results show a significant performance advantage when the feature sets identified by exact mutual information are used in both binary and multi-valued classification tasks, with longer CPU times.

X-MIFS: Exact Mutual Information for feature selection / Brunato, Mauro; Battiti, Roberto. - STAMPA. - (2016), pp. 3469-3476. (Intervento presentato al convegno IJCNN 2016 tenutosi a Vancouver, Canada nel 24th-29th July 2016) [10.1109/IJCNN.2016.7727644].

X-MIFS: Exact Mutual Information for feature selection

Brunato, Mauro;Battiti, Roberto
2016-01-01

Abstract

In machine learning, an information-theory optimal way to filter the best input features, without reference to any specific machine learning models, consists of maximizing the mutual information between the selected features and the model output, a choice which will minimize the uncertainty in the output to be predicted, given the feature values. Although this criterion is optimal in the context of information theory, a practical difficulty in using it lies in the need to estimate the mutual information from a limited set of input-output examples, in possibly very-high-dimensional input spaces. Estimating probability densities from some data points in these conditions is far from trivial. Starting from the seminal proposals in [1], different approaches focus on approximating the mutual information by considering a limited set of variable dependencies (like dependencies among couples or triplets), or by assuming specific forms for the probability densities (like Gaussian forms). In this paper we study the effect of considering the exact mutual information between selected features and output, without resorting to any approximation (apart from that implicit and unavoidable in estimating it from experimental data). The objectives of this investigation are: to assess how far one can go by adopting the exact mutual information in terms of CPU time and number of features, and to measure what is lost by adopting some popular approximations which consider only relationships among small subsets of features, assumptions about the distribution of feature values (e.g. Gaussian) or upper bounds on the mutual information as proxies to maximize instead of the exact value. The experimental results show a significant performance advantage when the feature sets identified by exact mutual information are used in both binary and multi-valued classification tasks, with longer CPU times.
2016
2016 International Joint Conference on Neural Networks, IJCNN
Piscataway, NJ
Institute of Electrical and Electronics Engineers Inc.
9781509006199
Brunato, Mauro; Battiti, Roberto
X-MIFS: Exact Mutual Information for feature selection / Brunato, Mauro; Battiti, Roberto. - STAMPA. - (2016), pp. 3469-3476. (Intervento presentato al convegno IJCNN 2016 tenutosi a Vancouver, Canada nel 24th-29th July 2016) [10.1109/IJCNN.2016.7727644].
File in questo prodotto:
File Dimensione Formato  
07727644.pdf

Solo gestori archivio

Descrizione: Versione editoriale, scaricata da IEEEXplore
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 225.23 kB
Formato Adobe PDF
225.23 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/166662
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact