The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this thesis depicts a set of systems based on Natural Language Processing (NLP), which take as input unstructured or semi-structured information sources and produce machine-readable statements for a target KB. We implement four main research contributions: (1) a one-step methodology for crowdsourcing the Frame Semantics annotation; (2) a NLP technique implementing the above contribution to perform N-ary Relation Extraction from Wikipedia, thus enriching the target KB with properties; (3) a taxonomy learning strategy to produce an intuitive and exhaustive class hierarchy from the Wikipedia category graph, thus augmenting the target KB with classes; (4) a recommender system that leverages a KB network to yield atypical suggestions with detailed explanations, serving as a proof of work for real-world end users. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its public endpoint, and/or downloaded as standalone data dumps.

Automatic Population of Structured Knowledge Bases via Natural Language Processing / Fossati, Marco. - (2017), pp. 1-204.

Automatic Population of Structured Knowledge Bases via Natural Language Processing

Fossati, Marco
2017-01-01

Abstract

The Web has evolved into a huge mine of knowledge carved in different forms, the predominant one still being the free-text document. This motivates the need for Intelligent Web-reading Agents: hypothetically, they would skim through disparate Web sources corpora and generate meaningful structured assertions to fuel Knowledge Bases (KBs). Ultimately, comprehensive KBs, like Wikidata and DBpedia, play a fundamental role to cope with the issue of information overload. On account of such vision, this thesis depicts a set of systems based on Natural Language Processing (NLP), which take as input unstructured or semi-structured information sources and produce machine-readable statements for a target KB. We implement four main research contributions: (1) a one-step methodology for crowdsourcing the Frame Semantics annotation; (2) a NLP technique implementing the above contribution to perform N-ary Relation Extraction from Wikipedia, thus enriching the target KB with properties; (3) a taxonomy learning strategy to produce an intuitive and exhaustive class hierarchy from the Wikipedia category graph, thus augmenting the target KB with classes; (4) a recommender system that leverages a KB network to yield atypical suggestions with detailed explanations, serving as a proof of work for real-world end users. The outcomes are incorporated into the Italian DBpedia chapter, can be queried through its public endpoint, and/or downloaded as standalone data dumps.
2017
XXVII
2015-2016
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Tummarello, Giovanni
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
DECLARATORIA_ENG.pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 297.29 kB
Formato Adobe PDF
297.29 kB Adobe PDF   Visualizza/Apri
marco_fossati_phd_thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.02 MB
Formato Adobe PDF
3.02 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/367787
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact