The improvement of text categorization by statistical methods can be performed from two main directions, namely the feature selection and the evaluation of characteristic weights. In this paper, we propose an enhanced text categorization method based on a modified mutual information algorithm and evaluation algorithm of characteristic weights which improves both aspects. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.
Text Categorization Method Based on Improved Mutual Information and Characteristic Weights Evaluation Algorithms / Z., Pei; Marchese, Maurizio; X., Shi; Y., Liang. - 4:(2007), pp. 87-91. (Intervento presentato al convegno Fourth International Conference on Fuzzy Systems and Knowledge Discovery tenutosi a Haikou, China nel 24-27 Aug. 2007) [10.1109/FSKD.2007.559].
Text Categorization Method Based on Improved Mutual Information and Characteristic Weights Evaluation Algorithms
Marchese, Maurizio;
2007-01-01
Abstract
The improvement of text categorization by statistical methods can be performed from two main directions, namely the feature selection and the evaluation of characteristic weights. In this paper, we propose an enhanced text categorization method based on a modified mutual information algorithm and evaluation algorithm of characteristic weights which improves both aspects. The proposed method is applied to the benchmark test set Reuters-21578 Top10 to examine its effectiveness. Numerical results show that the precision, the recall and the value of F1 of the proposed method are all superior to those of existing conventional methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione