Many practical problems in computer science require the knowledge of the most frequently occurring items in a data set. Current state-of-the-art algorithms for frequent items discovery are either fully centralized or rely on node hierarchies which are inflexible and prone to failures in massively distributed systems. In this paper we describe a family of gossip-based algorithms that efficiently approximate the most frequent items in large-scale distributed datasets. We show, both analytically and using real-world datasets, that our algorithms are fast, highly scalable, and resilient to node failures.

Identifying frequent items in distributed data sets

Montresor, Alberto
2013-01-01

Abstract

Many practical problems in computer science require the knowledge of the most frequently occurring items in a data set. Current state-of-the-art algorithms for frequent items discovery are either fully centralized or rely on node hierarchies which are inflexible and prone to failures in massively distributed systems. In this paper we describe a family of gossip-based algorithms that efficiently approximate the most frequent items in large-scale distributed datasets. We show, both analytically and using real-world datasets, that our algorithms are fast, highly scalable, and resilient to node failures.
2013
4
J., Sacha; Montresor, Alberto
File in questo prodotto:
File Dimensione Formato  
frequent-item.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/95527
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 6
social impact