High Performance Unstructured SpMM Computation Using Tensor Cores

Okanovic, Patrik; Kwasniewski, Grzegorz; Paolo Sylos Labini,; Besta, Maciej; Vella, Flavio; Hoefler, Torsten

doi:10.1109/SC41406.2024.00060

High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation, further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE, and Magicube) by up to 125x (on average 2.6x). SMaT can be used to accelerate many workloads in scientific computing, large model training, inference, and others.

High Performance Unstructured SpMM Computation Using Tensor Cores / Okanovic, Patrik; Kwasniewski, Grzegorz; Sylos Labini, Paolo; Besta, Maciej; Vella, Flavio; Hoefler, Torsten. - ELETTRONICO. - (2024), pp. 1-14. ( 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Atlanta, Georgia World Congress Center, USA 2024) [10.1109/SC41406.2024.00060].

High Performance Unstructured SpMM Computation Using Tensor Cores

Patrik Okanovic;Grzegorz Kwasniewski;Paolo Sylos Labini;Maciej Besta;Flavio Vella;Torsten Hoefler

2024-01-01

Abstract

High-performance sparse matrix-matrix (SpMM) multiplication is paramount for science and industry, as the ever-increasing sizes of data prohibit using dense data structures. Yet, existing hardware, such as Tensor Cores (TC), is ill-suited for SpMM, as it imposes strict constraints on data structures that cannot be met by unstructured sparsity found in many applications. To address this, we introduce (S)parse (Ma)trix Matrix (T)ensor Core-accelerated (SMaT): a novel SpMM library that utilizes TCs for unstructured sparse matrices. Our block-sparse library leverages the low-level CUDA MMA (matrix-matrix-accumulate) API, maximizing the performance offered by modern GPUs. Algorithmic optimizations such as sparse matrix permutation, further improve performance by minimizing the number of non-zero blocks. The evaluation on NVIDIA A100 shows that SMaT outperforms SotA libraries (DASP, cuSPARSE, and Magicube) by up to 125x (on average 2.6x). SMaT can be used to accelerate many workloads in scientific computing, large model training, inference, and others.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
			
	Luogo di edizione (Place of publication)
	
				345 E 47TH ST, NEW YORK, NY 10017 USA
			
	Casa editrice (Publisher)
	
				IEEE Computer Society
			
	ISBN
	
				9798350352917
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
			
	Settori scientifico-disciplinari (validi dal 09/05/2024) - Reference SSD (valid from 09/05/2024)
	
				Settore INFO-01/A - Informatica
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85215004295
			
	Codice WOS (WOS identifier)
	
				WOS:001414891300063
			
	Tutti gli autori
	
						Okanovic, Patrik; Kwasniewski, Grzegorz; Sylos Labini, Paolo; Besta, Maciej; Vella, Flavio; Hoefler, Torsten
					
	Citazione
	
				High Performance Unstructured SpMM Computation Using Tensor Cores / Okanovic, Patrik; Kwasniewski, Grzegorz; Sylos Labini, Paolo; Besta, Maciej; Vella, Flavio; Hoefler, Torsten. - ELETTRONICO. - (2024), pp. 1-14. ( 2024 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2024 Atlanta, Georgia World Congress Center, USA 2024) [10.1109/SC41406.2024.00060].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
High_Performance_Unstructured_SpMM_Computation_Using_Tensor_Cores.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.73 MB Formato Adobe PDF Visualizza/Apri	1.73 MB	Adobe PDF	Visualizza/Apri