Tree Kernel functions are powerful tools for solving different classes of problems requiring large amounts of structured information. Combined with accurate learning algorithms, such as Support Vector Machines, they allow us to directly encode rich syntactic data in our learning problems without requiring an explicit feature mapping function or deep specific domain knowledge. However, as other very high dimensional kernel families, they come with two major drawbacks: first, the computational complexity induced by the dual representation makes them unpractical for very large datasets or for situations where very fast classifiers are necessary, e.g. real time systems or web applications; second, their implicit nature somehow limits their scientific appeal, as the implicit models that we learn cannot cast new light on the studied problems. As a possible solution to these two problems, this Thesis presents an approach to feature selection for tree kernel functions in the context of Support Vector learning, based on a greedy exploration of the fragment space. Features are selected according to a gradient norm preservation criterion, i.e. we select the heaviest features that account for a large percentage of the gradient norm, and are explicitly modeled and represented. The result of the feature extraction process is a data structure that can be used to decode the input structured data, i.e. to explicitly describe a tree in terms of its more relevant fragments. We present theoretical insights that justify the adopted strategy and detail the algorithms and data structures used to explore the feature space and store the most relevant features. Experiments on three different multi-class NLP tasks and data sets, namely question classification, relation extraction and semantic role labeling, confirm the theoretical findings and show that the decoding process can produce very fast and accurate linear classifiers, along with the explicit representation of the most relevant structured features identified for each class.

Greedy Feature Selection in Tree Kernel Spaces / Pighin, Daniele. - (2010), pp. 1-203.

Greedy Feature Selection in Tree Kernel Spaces

Pighin, Daniele
2010-01-01

Abstract

Tree Kernel functions are powerful tools for solving different classes of problems requiring large amounts of structured information. Combined with accurate learning algorithms, such as Support Vector Machines, they allow us to directly encode rich syntactic data in our learning problems without requiring an explicit feature mapping function or deep specific domain knowledge. However, as other very high dimensional kernel families, they come with two major drawbacks: first, the computational complexity induced by the dual representation makes them unpractical for very large datasets or for situations where very fast classifiers are necessary, e.g. real time systems or web applications; second, their implicit nature somehow limits their scientific appeal, as the implicit models that we learn cannot cast new light on the studied problems. As a possible solution to these two problems, this Thesis presents an approach to feature selection for tree kernel functions in the context of Support Vector learning, based on a greedy exploration of the fragment space. Features are selected according to a gradient norm preservation criterion, i.e. we select the heaviest features that account for a large percentage of the gradient norm, and are explicitly modeled and represented. The result of the feature extraction process is a data structure that can be used to decode the input structured data, i.e. to explicitly describe a tree in terms of its more relevant fragments. We present theoretical insights that justify the adopted strategy and detail the algorithms and data structures used to explore the feature space and store the most relevant features. Experiments on three different multi-class NLP tasks and data sets, namely question classification, relation extraction and semantic role labeling, confirm the theoretical findings and show that the decoding process can produce very fast and accurate linear classifiers, along with the explicit representation of the most relevant structured features identified for each class.
2010
XXII
2009-2010
Ingegneria e Scienza dell'Informaz (cess.4/11/12)
Information and Communication Technology
Federico, Marcello
Moschitti, Alessandro
no
Inglese
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
File in questo prodotto:
File Dimensione Formato  
thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 6.17 MB
Formato Adobe PDF
6.17 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368779
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact