X-MIFS: Exact Mutual Information for feature selection

IRIS

In machine learning, an information-theory optimal way to filter the best input features, without reference to any specific machine learning models, consists of maximizing the mutual information between the selected features and the model output, a choice which will minimize the uncertainty in the output to be predicted, given the feature values. Although this criterion is optimal in the context of information theory, a practical difficulty in using it lies in the need to estimate the mutual information from a limited set of input-output examples, in possibly very-high-dimensional input spaces. Estimating probability densities from some data points in these conditions is far from trivial. Starting from the seminal proposals in [1], different approaches focus on approximating the mutual information by considering a limited set of variable dependencies (like dependencies among couples or triplets), or by assuming specific forms for the probability densities (like Gaussian forms). In this paper we study the effect of considering the exact mutual information between selected features and output, without resorting to any approximation (apart from that implicit and unavoidable in estimating it from experimental data). The objectives of this investigation are: to assess how far one can go by adopting the exact mutual information in terms of CPU time and number of features, and to measure what is lost by adopting some popular approximations which consider only relationships among small subsets of features, assumptions about the distribution of feature values (e.g. Gaussian) or upper bounds on the mutual information as proxies to maximize instead of the exact value. The experimental results show a significant performance advantage when the feature sets identified by exact mutual information are used in both binary and multi-valued classification tasks, with longer CPU times.

X-MIFS: Exact Mutual Information for feature selection / Brunato, Mauro; Battiti, Roberto. - STAMPA. - (2016), pp. 3469-3476. (Intervento presentato al convegno IJCNN 2016 tenutosi a Vancouver, Canada nel 24th-29th July 2016) [10.1109/IJCNN.2016.7727644].

X-MIFS: Exact Mutual Information for feature selection

Brunato, Mauro;Battiti, Roberto

2016-01-01

Abstract

In machine learning, an information-theory optimal way to filter the best input features, without reference to any specific machine learning models, consists of maximizing the mutual information between the selected features and the model output, a choice which will minimize the uncertainty in the output to be predicted, given the feature values. Although this criterion is optimal in the context of information theory, a practical difficulty in using it lies in the need to estimate the mutual information from a limited set of input-output examples, in possibly very-high-dimensional input spaces. Estimating probability densities from some data points in these conditions is far from trivial. Starting from the seminal proposals in [1], different approaches focus on approximating the mutual information by considering a limited set of variable dependencies (like dependencies among couples or triplets), or by assuming specific forms for the probability densities (like Gaussian forms). In this paper we study the effect of considering the exact mutual information between selected features and output, without resorting to any approximation (apart from that implicit and unavoidable in estimating it from experimental data). The objectives of this investigation are: to assess how far one can go by adopting the exact mutual information in terms of CPU time and number of features, and to measure what is lost by adopting some popular approximations which consider only relationships among small subsets of features, assumptions about the distribution of feature values (e.g. Gaussian) or upper bounds on the mutual information as proxies to maximize instead of the exact value. The experimental results show a significant performance advantage when the feature sets identified by exact mutual information are used in both binary and multi-valued classification tasks, with longer CPU times.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2016
			
	Titolo del volume (Proceedings title)
	
				2016 International Joint Conference on Neural Networks, IJCNN
			
	Luogo di edizione (Place of publication)
	
				Piscataway, NJ
			
	Casa editrice (Publisher)
	
				Institute of Electrical and Electronics Engineers Inc.
			
	ISBN
	
				9781509006199
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85007271201
			
	Codice WOS (WOS identifier)
	
				WOS:000399925503091
			
	Tutti gli autori
	
						Brunato, Mauro; Battiti, Roberto
					
	Citazione
	
				X-MIFS: Exact Mutual Information for feature selection / Brunato, Mauro; Battiti, Roberto. - STAMPA. - (2016), pp. 3469-3476. (Intervento presentato al  convegno IJCNN 2016 tenutosi a Vancouver, Canada nel 24th-29th July 2016) [10.1109/IJCNN.2016.7727644].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
07727644.pdf Solo gestori archivio Descrizione: Versione editoriale, scaricata da IEEEXplore Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 225.23 kB Formato Adobe PDF Visualizza/Apri	225.23 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/166662

Citazioni

ND

8

7

ND

social impact