Text categorization plays an important role in data mining.Feature selection is the most important process of text categorization.Focused on feature selection,we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing,propose an improved mutual information algorithm for feature selection,and develop an improved tf.idf method for characteristic weights evaluation.The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness.Numerical results show that the precision,the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.

An enhanced text categorization method based on improved text frequency approach and mutual information algorithm / Pei, Zhili; Shi, Xiaohu; Marchese, Maurizio; Liang, Yanchun. - In: PROGRESS IN NATURAL SCIENCE. - ISSN 1002-0071. - STAMPA. - 17:12(2007), pp. 1494-1500.

An enhanced text categorization method based on improved text frequency approach and mutual information algorithm

Marchese, Maurizio;Liang, Yanchun
2007-01-01

Abstract

Text categorization plays an important role in data mining.Feature selection is the most important process of text categorization.Focused on feature selection,we present an improved text frequency method for filtering of low frequency features to deal with the data preprocessing,propose an improved mutual information algorithm for feature selection,and develop an improved tf.idf method for characteristic weights evaluation.The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness.Numerical results show that the precision,the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
2007
12
Pei, Zhili; Shi, Xiaohu; Marchese, Maurizio; Liang, Yanchun
An enhanced text categorization method based on improved text frequency approach and mutual information algorithm / Pei, Zhili; Shi, Xiaohu; Marchese, Maurizio; Liang, Yanchun. - In: PROGRESS IN NATURAL SCIENCE. - ISSN 1002-0071. - STAMPA. - 17:12(2007), pp. 1494-1500.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/189054
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact