Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Daniele De Sensi,; Pichetti, Lorenzo; Vella, Flavio; Tiziano De Matteis,; Ren, Zebin; Fusco, Luigi; Turisini, Matteo; Cesarini, Daniele; Lust, Kurt; Trivedi, Animesh; Roweth, Duncan; Spiga, Filippo; Salvatore Di Girolamo,

doi:10.1109/SC41406.2024.00039

Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4,096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software ...

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects / De Sensi, D., Pichetti, L., Vella, F., De Matteis, T., Ren, Z., Fusco, L., Turisini, M., Cesarini, D., Lust, K., Trivedi, A., Roweth, D., Spiga, F., Di Girolamo, S.. - (2024), pp. 1-15. (2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Georgia World Congress Center, usa November 17 2024-November 22 2024) [10.1109/SC41406.2024.00039].

Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects

Daniele De Sensi;Lorenzo Pichetti;Flavio Vella;Tiziano De Matteis;Zebin Ren;Luigi Fusco;Matteo Turisini;Daniele Cesarini;Kurt Lust;Animesh Trivedi;Duncan Roweth;Filippo Spiga;Salvatore Di Girolamo

2024-01-01

Abstract

Multi-GPU nodes are increasingly common in the rapidly evolving landscape of exascale supercomputers. On these systems, GPUs on the same node are connected through dedicated networks, with bandwidths up to a few terabits per second. However, gauging performance expectations and maximizing system efficiency is challenging due to different technologies, design options, and software layers. This paper comprehensively characterizes three supercomputers - Alps, Leonardo, and LUMI - each with a unique architecture and design. We focus on performance evaluation of intra-node and inter-node interconnects on up to 4,096 GPUs, using a mix of intra-node and inter-node benchmarks. By analyzing its limitations and opportunities, we aim to offer practical guidance to researchers, system architects, and software developers dealing with multi-GPU supercomputing. Our results show that there is untapped bandwidth, and there are still many opportunities for optimization, ranging from network to software ...

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
			
	Luogo di edizione (Place of publication)
	
				345 E 47TH ST, NEW YORK, NY 10017 USA
			
	Casa editrice (Publisher)
	
				IEEE Computer Society
			
	ISBN
	
				9798350352917
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Settori scientifico-disciplinari (validi dal 09/05/2024) - Reference SSD (valid from 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85214982770
			
	Codice WOS (WOS identifier)
	
				WOS:001414891300058
			
	Tutti gli autori
	
						De Sensi, Daniele; Pichetti, Lorenzo; Vella, Flavio; De Matteis, Tiziano; Ren, Zebin; Fusco, Luigi; Turisini, Matteo; Cesarini, Daniele; Lust, Kurt; T...espandi
						
	Citazione
	
				Exploring GPU-to-GPU Communication: Insights into Supercomputer Interconnects / De Sensi, D., Pichetti, L., Vella, F., De Matteis, T., Ren, Z., Fusco, L., Turisini, M., Cesarini, D., Lust, K., Trivedi, A., Roweth, D., Spiga, F., Di Girolamo, S.. - (2024), pp. 1-15. (2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Georgia World Congress Center, usa November 17 2024-November 22 2024) [10.1109/SC41406.2024.00039].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Exploring_GPU-to-GPU_Communication_Insights_into_Supercomputer_Interconnects.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 994.15 kB Formato Adobe PDF Visualizza/Apri	994.15 kB	Adobe PDF	Visualizza/Apri