Enabling access to and exploration of information graphs

Lissandrini, Matteo

doi:10.15168/11572_368600

Exploratory search is the new frontier of information consumption as it goes well beyond simple \emph{lookups}. Information repositories are ubiquitous and grow larger every day, and automated search systems help users find information in such collections. To extract knowledge from these repositories, the common ``query lookup'' retrieval paradigm accepts a set of specifications (the query) that describes the objects of interest and then collects such objects. Yet, the query lookup retrieval paradigms commonly in use are no more sufficient to support complex information needs, as they can only provide candidate starting points, but do not help the user in expanding their knowledge. To ease access and consumption of rich information repositories, we address the crucial problem of data exploration. Exploratory tasks match the natural need for finding answers to open-ended information needs within an unfamiliar environment. In particular, in this dissertation, we focus on enabling access to and exploration of rich information graphs. Within businesses, organizations, and among researchers, data is produced in many forms, large volumes, and different contexts. As a consequence of this heterogeneity, many applications find more useful modelling their datasets with the graph model, where information is represented with entities (nodes) and relationships (edges). Those are the data graphs, the graph databases, the knowledge graphs, or more generally information graphs. The richness of their schema and of their content makes it challenging for users to express appropriate queries and retrieve the desired results. Hence, to allow an effective exploration of a graph, we require: (i) an expressive \emph{query paradigm}, (ii) an intuitive \emph{query mechanism}, and (iii) an appropriate \emph{storage and query processing system}. In this work, we address these three requirements. An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL), and at the same time, it should retain the flexibility and expressiveness of such languages. For this reason, with respect to the query paradigm, we introduce the notion of \emph{exemplar queries} and propose extensions to handle multiple incomplete examples. An exemplar query is a query method in which the user, or the analyst, circumvents query languages by using examples as input. In particular, the solution we design allows flexible matching in the case of incomplete or partially specified examples. Moreover, to enable this query paradigm, there is the need for interactive systems that implement an incremental query-constructions mechanism and interactive explorations. To address this need, we study algorithms and implementations based on pseudo-relevance feedback for \emph{exemplar query suggestion}, along with an in-depth study of their effectiveness. Finally, as there exist many graph databases, high heterogeneity can be observed in the functionalities and performances of these systems. We provide an exhaustive evaluation methodology and a comprehensive study of the existing systems that allow to understand their capabilities and limitations. In particular, we design a novel micro-benchmarking framework for the assessment of the functionalities of some graph databases among the most prominent in the area and provide detailed insights on their performance.

Enabling access to and exploration of information graphs / Lissandrini, Matteo. - (2018), pp. 1-170. [10.15168/11572_368600]

Enabling access to and exploration of information graphs

Lissandrini, Matteo

2018-01-01

Abstract

Exploratory search is the new frontier of information consumption as it goes well beyond simple \emph{lookups}. Information repositories are ubiquitous and grow larger every day, and automated search systems help users find information in such collections. To extract knowledge from these repositories, the common ``query lookup'' retrieval paradigm accepts a set of specifications (the query) that describes the objects of interest and then collects such objects. Yet, the query lookup retrieval paradigms commonly in use are no more sufficient to support complex information needs, as they can only provide candidate starting points, but do not help the user in expanding their knowledge. To ease access and consumption of rich information repositories, we address the crucial problem of data exploration. Exploratory tasks match the natural need for finding answers to open-ended information needs within an unfamiliar environment. In particular, in this dissertation, we focus on enabling access to and exploration of rich information graphs. Within businesses, organizations, and among researchers, data is produced in many forms, large volumes, and different contexts. As a consequence of this heterogeneity, many applications find more useful modelling their datasets with the graph model, where information is represented with entities (nodes) and relationships (edges). Those are the data graphs, the graph databases, the knowledge graphs, or more generally information graphs. The richness of their schema and of their content makes it challenging for users to express appropriate queries and retrieve the desired results. Hence, to allow an effective exploration of a graph, we require: (i) an expressive \emph{query paradigm}, (ii) an intuitive \emph{query mechanism}, and (iii) an appropriate \emph{storage and query processing system}. In this work, we address these three requirements. An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL), and at the same time, it should retain the flexibility and expressiveness of such languages. For this reason, with respect to the query paradigm, we introduce the notion of \emph{exemplar queries} and propose extensions to handle multiple incomplete examples. An exemplar query is a query method in which the user, or the analyst, circumvents query languages by using examples as input. In particular, the solution we design allows flexible matching in the case of incomplete or partially specified examples. Moreover, to enable this query paradigm, there is the need for interactive systems that implement an incremental query-constructions mechanism and interactive explorations. To address this need, we study algorithms and implementations based on pseudo-relevance feedback for \emph{exemplar query suggestion}, along with an in-depth study of their effectiveness. Finally, as there exist many graph databases, high heterogeneity can be observed in the functionalities and performances of these systems. We provide an exhaustive evaluation methodology and a comprehensive study of the existing systems that allow to understand their capabilities and limitations. In particular, we design a novel micro-benchmarking framework for the assessment of the functionalities of some graph databases among the most prominent in the area and provide detailed insights on their performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2018
			
	Ciclo
	
				XXVIII
			
	Anno Accademico
	
				2018-2019
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Velegrakis, Yannis
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_368600
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
matteo.lissandrini-thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 8.68 MB Formato Adobe PDF Visualizza/Apri	8.68 MB	Adobe PDF	Visualizza/Apri
DECLARATORIA_ENG.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 274.36 kB Formato Adobe PDF Visualizza/Apri	274.36 kB	Adobe PDF	Visualizza/Apri