Text categorization plays an important role in data mining.Feature selection is the most important process of text categorization.Focused on feature selection,we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing,propose an improved mutual information algorithm for feature selection,and develop an improved tf.idf method for characteristic weights evaluation.The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness.Numerical results show that the precision,the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
An enhanced text categorization method based on improved text frequency approach and mutual information algorithm / Pei, Zhili; Shi, Xiaohu; Marchese, Maurizio; Liang, Yanchun. - In: PROGRESS IN NATURAL SCIENCE. - ISSN 1002-0071. - STAMPA. - 17:12(2007), pp. 1494-1500.
An enhanced text categorization method based on improved text frequency approach and mutual information algorithm
Marchese, Maurizio;Liang, Yanchun
2007-01-01
Abstract
Text categorization plays an important role in data mining.Feature selection is the most important process of text categorization.Focused on feature selection,we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing,propose an improved mutual information algorithm for feature selection,and develop an improved tf.idf method for characteristic weights evaluation.The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness.Numerical results show that the precision,the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione