The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI-3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular applications as communication-avoiding algorithms for structured applications. We propose CLaMPI, a caching library layered on top of MPI-3 RMA, to automatically optimize code with minimum user intervention. We demonstrate how cached RMA improves the performance of a Barnes Hut simulation and a Local Clustering Coefficient computation up to a factor of 1.8x and 5x, respectively. Due to the low overheads in the cache miss case and the potential benefits, we expect that our ideas around transparent RMA caching will soon be an integral part of many MPI libraries.

Transparent Caching for RMA Systems / Girolamo, S. D.; Vella, F.; Hoefler, T.. - ELETTRONICO. - (2017), pp. 1018-1027. (Intervento presentato al convegno 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 tenutosi a Orlando, FL, USA nel 29 May - 2 June 2017) [10.1109/IPDPS.2017.92].

Transparent Caching for RMA Systems

Vella F.;
2017-01-01

Abstract

The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI-3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular applications as communication-avoiding algorithms for structured applications. We propose CLaMPI, a caching library layered on top of MPI-3 RMA, to automatically optimize code with minimum user intervention. We demonstrate how cached RMA improves the performance of a Barnes Hut simulation and a Local Clustering Coefficient computation up to a factor of 1.8x and 5x, respectively. Due to the low overheads in the cache miss case and the potential benefits, we expect that our ideas around transparent RMA caching will soon be an integral part of many MPI libraries.
2017
Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium, IPDPS 2017
New York, USA
Institute of Electrical and Electronics Engineers Inc.
978-1-5386-3914-6
Girolamo, S. D.; Vella, F.; Hoefler, T.
Transparent Caching for RMA Systems / Girolamo, S. D.; Vella, F.; Hoefler, T.. - ELETTRONICO. - (2017), pp. 1018-1027. (Intervento presentato al convegno 31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 tenutosi a Orlando, FL, USA nel 29 May - 2 June 2017) [10.1109/IPDPS.2017.92].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/332856
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact