Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.

Counts-of-counts similarity for prediction and search in relational data / Jaeger, M.; Lippi, M.; Pellegrini, G.; Passerini, A.. - In: DATA MINING AND KNOWLEDGE DISCOVERY. - ISSN 1384-5810. - 2019, 33:5(2019), pp. 1254-1297. [10.1007/s10618-019-00621-7]

Counts-of-counts similarity for prediction and search in relational data

Pellegrini G.;Passerini A.
2019

Abstract

Defining appropriate distance functions is a crucial aspect of effective and efficient similarity-based prediction and retrieval. Relational data are especially challenging in this regard. By viewing relational data as multi-relational graphs, one can easily see that a distance between a pair of nodes can be defined in terms of a virtually unlimited class of features, including node attributes, attributes of node neighbors, structural aspects of the node neighborhood and arbitrary combinations of these properties. In this paper we propose a rich and flexible class of metrics on graph entities based on earth mover’s distance applied to a hierarchy of complex counts-of-counts statistics. We further propose an approximate version of the distance using sums of marginal earth mover’s distances. We show that the approximation is correct for many cases of practical interest and allows efficient nearest-neighbor retrieval when combined with a simple metric tree data structure. An experimental evaluation on two real-world scenarios highlights the flexibility of our framework for designing metrics representing different notions of similarity. Substantial improvements in similarity-based prediction are reported when compared to solutions based on state-of-the-art graph kernels.
5
Jaeger, M.; Lippi, M.; Pellegrini, G.; Passerini, A.
Counts-of-counts similarity for prediction and search in relational data / Jaeger, M.; Lippi, M.; Pellegrini, G.; Passerini, A.. - In: DATA MINING AND KNOWLEDGE DISCOVERY. - ISSN 1384-5810. - 2019, 33:5(2019), pp. 1254-1297. [10.1007/s10618-019-00621-7]
File in questo prodotto:
File Dimensione Formato  
tethisto.pdf

accesso aperto

Tipologia: Pre-print non referato (Non-refereed preprint)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.39 MB
Formato Adobe PDF
1.39 MB Adobe PDF Visualizza/Apri
Jaeger2019_Article_Counts-of-countsSimilarityForP.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11572/251223
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact