Research to accelerate matrix multiplication, pushed by the growing computational demands of deep learning, has sprouted many efficient architectural solutions, such as NVIDIA's Tensor Cores. These accelerators are designed to process efficiently a high volume of small dense matrix products in parallel. However, it is not obvious how to leverage these accelerators for sparse matrix multiplication. A natural way to adapt the accelerators to this problem is to divide the matrix into small blocks, and then multiply only the nonzero blocks. In this paper, we investigate ways to reorder the rows of a sparse matrix to reduce the number of nonzero blocks and cluster the nonzero elements into a few dense blocks. While this pre-processing can be computationally expensive, we show that the high speed-up provided by the accelerators can easily repay the cost, especially when several multiplications follow one reordering.

Blocking Sparse Matrices to Leverage Dense-Specific Multiplication / Labini, P. S.; Bernaschi, M.; Nutt, W.; Silvestri, F.; Vella, F.. - ELETTRONICO. - (2022), pp. 19-24. (Intervento presentato al convegno 2022 Workshop on Irregular Applications: Architectures and Algorithms, IA3 2022 tenutosi a Dallas, TX, USA nel 13-18 November, 2022) [10.1109/IA356718.2022.00009].

Blocking Sparse Matrices to Leverage Dense-Specific Multiplication

Vella F.
Ultimo
2022-01-01

Abstract

Research to accelerate matrix multiplication, pushed by the growing computational demands of deep learning, has sprouted many efficient architectural solutions, such as NVIDIA's Tensor Cores. These accelerators are designed to process efficiently a high volume of small dense matrix products in parallel. However, it is not obvious how to leverage these accelerators for sparse matrix multiplication. A natural way to adapt the accelerators to this problem is to divide the matrix into small blocks, and then multiply only the nonzero blocks. In this paper, we investigate ways to reorder the rows of a sparse matrix to reduce the number of nonzero blocks and cluster the nonzero elements into a few dense blocks. While this pre-processing can be computationally expensive, we show that the high speed-up provided by the accelerators can easily repay the cost, especially when several multiplications follow one reordering.
2022
Proceedings of IA3 2022: Workshop on Irregular Applications: Architectures and Algorithms, Held in conjunction with SC 2022: The International Conference for High Performance Computing, Networking, Storage and Analysis
10662 LOS VAQUEROS CIRCLE, PO BOX 3014, LOS ALAMITOS, CA 90720-1264 USA
Institute of Electrical and Electronics Engineers Inc.
978-1-6654-7506-8
Labini, P. S.; Bernaschi, M.; Nutt, W.; Silvestri, F.; Vella, F.
Blocking Sparse Matrices to Leverage Dense-Specific Multiplication / Labini, P. S.; Bernaschi, M.; Nutt, W.; Silvestri, F.; Vella, F.. - ELETTRONICO. - (2022), pp. 19-24. (Intervento presentato al convegno 2022 Workshop on Irregular Applications: Architectures and Algorithms, IA3 2022 tenutosi a Dallas, TX, USA nel 13-18 November, 2022) [10.1109/IA356718.2022.00009].
File in questo prodotto:
File Dimensione Formato  
Blocking_Sparse_Matrices_to_Leverage_Dense-Specific_Multiplication.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 847.12 kB
Formato Adobe PDF
847.12 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/372887
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact