Identifying frequent items in distributed data sets

Sacha, J.; Montresor, Alberto

doi:10.1007/s00607-012-0220-1

Many practical problems in computer science require the knowledge of the most frequently occurring items in a data set. Current state-of-the-art algorithms for frequent items discovery are either fully centralized or rely on node hierarchies which are inflexible and prone to failures in massively distributed systems. In this paper we describe a family of gossip-based algorithms that efficiently approximate the most frequent items in large-scale distributed datasets. We show, both analytically and using real-world datasets, that our algorithms are fast, highly scalable, and resilient to node failures.

Identifying frequent items in distributed data sets

J. Sacha;Montresor, Alberto

2013-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2013
			
	Titolo del periodico (Journal title)
	
				COMPUTING
			
	Numero e parte del fascicolo (Issue number and part)
	
				4
			
	DOI
	
				https://dx.doi.org/10.1007/s00607-012-0220-1
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-84876460058
			
	Codice WOS (WOS identifier)
	
				WOS:000316820400002
			
	Tutti gli autori
	
						J., Sacha; Montresor, Alberto
					
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
frequent-item.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.02 MB Formato Adobe PDF Visualizza/Apri	1.02 MB	Adobe PDF	Visualizza/Apri