Large-scale Structural Reranking for Hierarchical Text Categorization

Ju, Qi

Current hierarchical text categorization (HTC) methods mainly fall into three directions: (1) Flat one-vs.-all approach, which flattens the hierarchy into independent nodes and trains a binary one-vs.-all classifier for each node. (2) Top-down method, which uses the hierarchical structure to decompose the entire problem into a set of smaller sub-problems, and deals with such sub-problems in top-down fashion along the hierarchy. (3) Big-bang approach, which learns a single (but generally complex) global model for the class hierarchy as a whole with a single run of the learning algorithm. These methods were shown to provide relatively high performance in previous evaluations. However, they still suffer from two main drawbacks: (1) relatively low accuracy as they disregard category dependencies, or (2) low computational efficiency when considering such dependencies. In order to build an accurate and efficient model we adopted the following strategy: first, we design advanced global reranking models (GR) that exploit structural dependencies in hierarchical multi-label text classification (TC). They are based on two algorithms: (1) to generate the k-best classification of hypotheses based on decision probabilities of the flat one-vs.-all and top-down methods; and (2) to encode dependencies in the reranker by: (i) modeling hypotheses as trees derived by the hierarchy itself and (ii) applying tree kernels (TK) to them. Such TK-based reranker selects the best hierarchical test hypothesis, which is naturally represented as a labeled tree. Additionally, to better investigate the role of category relationships, we consider two interesting cases: (i) traditional schemes in which node-fathers include all the documents of their child-categories; and (ii) more general schemes, in which children can include documents not belonging to their fathers. Second, we propose an efficient local incremental reranking model (LIR), which combines a top-down method with a local reranking model for each sub-problem. These local rerankers improve the accuracy by absorbing the local category dependencies of sub-problems, which alleviate the errors of top-down method in the higher levels of the hierarchy. The application of LIR recursively deals with the sub-problems by applying the corresponding local rerankers in top-down fashion, resulting in high efficiency. In addition, we further optimize LIR by (i) improving the top-down method by creating local dictionaries for each sub-problem; (ii) using LIBLINEAR instead of LIBSVM; and (iii) adopting the compact representation of hypotheses for learning the local reranking model. This makes LIR applicable for large-scale hierarchical text categorization. The experimentation on different hierarchical datasets has shown promising enhancements by exploiting the structural dependencies in large-scale hierarchical text categorization.

Large-scale Structural Reranking for Hierarchical Text Categorization / Ju, Qi. - (2013), pp. 1-130.

Large-scale Structural Reranking for Hierarchical Text Categorization

JU, QI

2013-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2013
			
	Ciclo
	
				XXV
			
	Anno Accademico
	
				2012-2013
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Giunchiglia, Fausto
Moschitti, Alessandro
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
Qi_Thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 2.68 MB Formato Adobe PDF Visualizza/Apri	2.68 MB	Adobe PDF	Visualizza/Apri