Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4,096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software ...

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects / De Sensi, Daniele; Pichetti, Lorenzo; Vella, Flavio; De Matteis, Tiziano; Ren, Zebin; Fusco, Luigi; Turisini, Matteo; Cesarini, Daniele; Lust, Kurt; Trivedi, Animesh; Roweth, Duncan; Spiga, Filippo; Di Girolamo, Salvatore. - (2024), pp. 1-15. ( 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Georgia World Congress Center, usa November 17 2024-November 22 2024) [10.1109/SC41406.2024.00039].

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Flavio Vella
;
2024-01-01

Abstract

Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4,096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software ...
2024
SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
345 E 47TH ST, NEW YORK, NY 10017 USA
IEEE Computer Society
9798350352917
Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
De Sensi, Daniele; Pichetti, Lorenzo; Vella, Flavio; De Matteis, Tiziano; Ren, Zebin; Fusco, Luigi; Turisini, Matteo; Cesarini, Daniele; Lust, Kurt; T...espandi
Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects / De Sensi, Daniele; Pichetti, Lorenzo; Vella, Flavio; De Matteis, Tiziano; Ren, Zebin; Fusco, Luigi; Turisini, Matteo; Cesarini, Daniele; Lust, Kurt; Trivedi, Animesh; Roweth, Duncan; Spiga, Filippo; Di Girolamo, Salvatore. - (2024), pp. 1-15. ( 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Georgia World Congress Center, usa November 17 2024-November 22 2024) [10.1109/SC41406.2024.00039].
File in questo prodotto:
File Dimensione Formato  
Exploring_GPU-to-GPU_Communication_Insights_into_Supercomputer_Interconnects.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 994.15 kB
Formato Adobe PDF
994.15 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/445352
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 3
  • OpenAlex ND
social impact