Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommen-dation. To cope with the computational and memory demands that stem from the size of today's graph datasets, distributed-memory algorithms have to be developed. Current state-of-the-art solutions suffer from synchronization overheads or expensive pre-computations needed to distribute the graph, achieving limited scaling capabilities. We propose a fully asynchronous implementation for triangle counting and local clustering coef-ficient based on 1D partitioning, using remote memory accesses for transferring data and avoid synchronization. Additionally, we show how these algorithms present data reuse on remote memory accesses and how the overall communication time can be improved by caching these accesses. Finally, we extend CLaMPI, a software-layer caching system for MPI RMA, to include application-specific scores for cached entries and influence the eviction procedure to improve caching efficiency. Our results show improvements on shared memory, and we achieve 14x speedup from 4 to 64 nodes for the LiveJoumal 1 graph on distributed memory. Moreover, we demonstrate how caching remote accesses reduces total running time by up to 73 % with respect to a non-cached version. Finally, we compare our implementation to TriC, the 2020 graph champion paper, and achieve up to 100x faster results for scale-free graphs.

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching / Strausz, A.; Vella, F.; Di Girolamo, S.; Besta, M.; Hoefler, T.. - ELETTRONICO. - (2022), pp. 291-301. (Intervento presentato al convegno 36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022 tenutosi a Lyon France / Virtual nel 2022) [10.1109/IPDPS53621.2022.00036].

Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching

Vella F.;
2022-01-01

Abstract

Triangle count and local clustering coefficient are two core metrics for graph analysis. They find broad application in analyses such as community detection and link recommen-dation. To cope with the computational and memory demands that stem from the size of today's graph datasets, distributed-memory algorithms have to be developed. Current state-of-the-art solutions suffer from synchronization overheads or expensive pre-computations needed to distribute the graph, achieving limited scaling capabilities. We propose a fully asynchronous implementation for triangle counting and local clustering coef-ficient based on 1D partitioning, using remote memory accesses for transferring data and avoid synchronization. Additionally, we show how these algorithms present data reuse on remote memory accesses and how the overall communication time can be improved by caching these accesses. Finally, we extend CLaMPI, a software-layer caching system for MPI RMA, to include application-specific scores for cached entries and influence the eviction procedure to improve caching efficiency. Our results show improvements on shared memory, and we achieve 14x speedup from 4 to 64 nodes for the LiveJoumal 1 graph on distributed memory. Moreover, we demonstrate how caching remote accesses reduces total running time by up to 73 % with respect to a non-cached version. Finally, we compare our implementation to TriC, the 2020 graph champion paper, and achieve up to 100x faster results for scale-free graphs.
2022
Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022
New York City
Institute of Electrical and Electronics Engineers Inc.
978-1-6654-8106-9
Strausz, A.; Vella, F.; Di Girolamo, S.; Besta, M.; Hoefler, T.
Asynchronous Distributed-Memory Triangle Counting and LCC with RMA Caching / Strausz, A.; Vella, F.; Di Girolamo, S.; Besta, M.; Hoefler, T.. - ELETTRONICO. - (2022), pp. 291-301. (Intervento presentato al convegno 36th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2022 tenutosi a Lyon France / Virtual nel 2022) [10.1109/IPDPS53621.2022.00036].
File in questo prodotto:
File Dimensione Formato  
Asynchronous_Distributed-Memory_Triangle_Counting_and_LCC_with_RMA_Caching.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 868.93 kB
Formato Adobe PDF
868.93 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/352981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact