The two key aspects of natural language processing (NLP) applications based on machine learning techniques are the learning algorithm and the feature representation of the documents, entities, or words that have to be manipulated. Until now, the majority of the approaches exploited syntactic features, while semantic feature extraction suffered from low coverage of the available knowledge resources and the difficulty to match text and ontology elements. Nowadays, the Semantic Web made available a large amount of logically encoded world knowledge called Linked Open Data (LOD). However, extending state-of-the-art natural language applications to use LOD resources is not a trivial task due to a number of reasons, including natural language ambiguity and heterogeneity and ambiguity of the schemes adopted by different LOD resources. In this thesis we define a general framework for supporting NLP with semantic features extracted from LOD. The main idea behind the framework is to (i) map terms in text to the unique resource identifiers (URIs) of LOD concepts through Wikipedia mediation; (ii) use the URIs to obtain background knowledge from LOD; (iii) integrate the obtained knowledge as semantic features into machine learning algorithms. We evaluate the framework by means of case studies on coreference resolution and relation extraction. Additionally, we propose an approach for increasing accuracy of the mapping step based on the "one sense per discourse" hypothesis. Finally, we present an open-source Java tool for extracting LOD knowledge through SPARQL endpoints and converting it to NLP features.

A General Framework for Exploiting Background Knowledge in Natural Language Processing / Tymoshenko, Kateryna. - (2012), pp. 1-151.

A General Framework for Exploiting Background Knowledge in Natural Language Processing

Tymoshenko, Kateryna
2012-01-01

Abstract

The two key aspects of natural language processing (NLP) applications based on machine learning techniques are the learning algorithm and the feature representation of the documents, entities, or words that have to be manipulated. Until now, the majority of the approaches exploited syntactic features, while semantic feature extraction suffered from low coverage of the available knowledge resources and the difficulty to match text and ontology elements. Nowadays, the Semantic Web made available a large amount of logically encoded world knowledge called Linked Open Data (LOD). However, extending state-of-the-art natural language applications to use LOD resources is not a trivial task due to a number of reasons, including natural language ambiguity and heterogeneity and ambiguity of the schemes adopted by different LOD resources. In this thesis we define a general framework for supporting NLP with semantic features extracted from LOD. The main idea behind the framework is to (i) map terms in text to the unique resource identifiers (URIs) of LOD concepts through Wikipedia mediation; (ii) use the URIs to obtain background knowledge from LOD; (iii) integrate the obtained knowledge as semantic features into machine learning algorithms. We evaluate the framework by means of case studies on coreference resolution and relation extraction. Additionally, we propose an approach for increasing accuracy of the mapping step based on the "one sense per discourse" hypothesis. Finally, we present an open-source Java tool for extracting LOD knowledge through SPARQL endpoints and converting it to NLP features.
2012
XXIV
2011-2012
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Giuliano, Claudio
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
tymoshenko-thesis-submitted.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.18 MB
Formato Adobe PDF
1.18 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368094
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact