ProbGraph: high-performance and high-accuracy graph mining with probabilistic set representations

IRIS

Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50\texttimes{} on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community.

ProbGraph: high-performance and high-accuracy graph mining with probabilistic set representations / Besta, M., Miglioli, C., Sylos Labini, P., Tětek, J., Iff, P., Kanakagiri, R., Ashkboos, S., Janda, K., Podstawski, M., Kwasniewski, G., Gleinig, N., Vella, F., Mutlu, O., Hoefler, T.. - ELETTRONICO. - 2022-:(2022), pp. 1-17. (2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 Dallas, USA 13th -18th November 2022) [10.1109/SC41404.2022.00048].

ProbGraph: high-performance and high-accuracy graph mining with probabilistic set representations

Miglioli, Cesare;Sylos Labini, Paolo;Tětek, Jakub;Iff, Patrick;Kanakagiri, Raghavendra;Ashkboos, Saleh;Janda, Kacper^Penultimo;Podstawski, Michal;Kwasniewski, Grzegorz;Gleinig, Niels;Vella, Flavio^Co-ultimo;Mutlu, Onur^Co-ultimo;Hoefler, Torsten^Co-ultimo

2022-01-01

Abstract

Important graph mining problems such as Clustering are computationally demanding. To significantly accelerate these problems, we propose ProbGraph: a graph representation that enables simple and fast approximate parallel graph mining with strong theoretical guarantees on work, depth, and result accuracy. The key idea is to represent sets of vertices using probabilistic set representations such as Bloom filters. These representations are much faster to process than the original vertex sets thanks to vectorizability and small size. We use these representations as building blocks in important parallel graph mining algorithms such as Clique Counting or Clustering. When enhanced with ProbGraph, these algorithms significantly outperform tuned parallel exact baselines (up to nearly 50\texttimes{} on 32 cores) while ensuring accuracy of more than 90% for many input graph datasets. Our novel bounds and algorithms based on probabilistic set representations with desirable statistical properties are of separate interest for the data analytics community.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
			
	Luogo di edizione (Place of publication)
	
				Piscataway, NJ USA
			
	Casa editrice (Publisher)
	
				IEEE Computer Society
			
	ISBN
	
				978-1-6654-5444-5
978-1-6654-5445-2
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85143573874
			
	Codice WOS (WOS identifier)
	
				WOS:000981934800074
			
	Tutti gli autori
	
						Besta, Maciej; Miglioli, Cesare; Sylos Labini, Paolo; Tětek, Jakub; Iff, Patrick; Kanakagiri, Raghavendra; Ashkboos, Saleh; Janda, Kacper; Podstawski,...espandi
						
	Citazione
	
				ProbGraph: high-performance and high-accuracy graph mining with probabilistic set representations / Besta, M., Miglioli, C., Sylos Labini, P., Tětek, J., Iff, P., Kanakagiri, R., Ashkboos, S., Janda, K., Podstawski, M., Kwasniewski, G., Gleinig, N., Vella, F., Mutlu, O., Hoefler, T.. - ELETTRONICO. - 2022-:(2022), pp. 1-17. (2022 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2022 Dallas, USA 13th -18th November 2022) [10.1109/SC41404.2022.00048].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
sc22.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.87 MB Formato Adobe PDF Visualizza/Apri	1.87 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/360842

Citazioni

ND

6

1

9

social impact