Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM-based keyword extraction from scientific documents. Four definitions of regular scientific documents are proposed, and the analysis on the experimental results is performed based on the proposed definitions. The experimental results confirm the intuition that abstract is important for keywords extraction. © 2009 IEEE.
Data preprocessing in SVM-based keywords extraction from scientific documents / Wu, Chunguo; Marchese, Maurizio; Wang, Yufei; Krapivin, Mikalai; Wang, Chaoyong; Li, Xitong; Liang, Yanchun. - (2009), pp. 810-813. (Intervento presentato al convegno 2009 4th International Conference on Innovative Computing, Information and Control, ICICIC 2009 tenutosi a Kaohsiung, Taiwan nel 2009) [10.1109/ICICIC.2009.155].
Data preprocessing in SVM-based keywords extraction from scientific documents
Marchese, Maurizio;Krapivin, Mikalai;Liang, Yanchun
2009-01-01
Abstract
Scientific documents are unstructured data consisting of natural language and hard for scientists to read and manage. Keywords are very helpful for scientists to search the related documents and know about their contents in a prompt way. In this paper we investigate a kind of data preprocessing technique used in SVM-based keyword extraction from scientific documents. Four definitions of regular scientific documents are proposed, and the analysis on the experimental results is performed based on the proposed definitions. The experimental results confirm the intuition that abstract is important for keywords extraction. © 2009 IEEE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione