Top-k item identification on dynamic and distributed datasets

IRIS

The problem of identifying the most frequent items across multiple datasets has received considerable attention over the last few years. When storage is a scarce resource, the topic is already a challenge; yet, its complexity may be further exacerbated not only by the many independent data sources, but also by the dynamism of the data, i.e., the fact that new items may appear and old ones disappear at any time. In this work, we provide a novel approach to the problem by using an existing gossip-based algorithm for identifying the k most frequent items over a distributed collection of datasets, in ways that deal with the dynamic nature of the data. The algorithm has been thoroughly analyzed through trace-based simulations and compared to state-of-the-art decentralized solutions, showing better precision at reduced communication overhead.

Top-k item identification on dynamic and distributed datasets

A. Guerrieri;Montresor, Alberto;Velegrakis, Ioannis

2014-01-01

Abstract

The problem of identifying the most frequent items across multiple datasets has received considerable attention over the last few years. When storage is a scarce resource, the topic is already a challenge; yet, its complexity may be further exacerbated not only by the many independent data sources, but also by the dynamism of the data, i.e., the fact that new items may appear and old ones disappear at any time. In this work, we provide a novel approach to the problem by using an existing gossip-based algorithm for identifying the k most frequent items over a distributed collection of datasets, in ways that deal with the dynamic nature of the data. The algorithm has been thoroughly analyzed through trace-based simulations and compared to state-of-the-art decentralized solutions, showing better precision at reduced communication overhead.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2014
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 20th International Conference on Parallel Processing, Euro-Par 2014
			
	Autore/i del libro (Book author/s)
	
				AA. VV.
			
	Luogo di edizione (Place of publication)
	
				Berlin
			
	Casa editrice (Publisher)
	
				Springer Verlag
			
	ISBN
	
				9783319098722
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-84906346219
			
	Codice WOS (WOS identifier)
	
				WOS:000371297400023
			
	Tutti gli autori
	
						A., Guerrieri; Montresor, Alberto; Velegrakis, Ioannis
					
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/100373

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

3

2

5

social impact