Research to accelerate matrix multiplication, pushed by the growing computational demands of deep learning, has sprouted many efficient architectural solutions, such as NVIDIA's Tensor Cores. These accelerators are designed to process efficiently a high volume of small dense matrix products in parallel. However, it is not obvious how to leverage these accelerators for sparse matrix multiplication. A natural way to adapt the accelerators to this problem is to divide the matrix into small blocks, and then multiply only the nonzero blocks. In this paper, we investigate ways to reorder the rows of a sparse matrix to reduce the number of nonzero blocks and cluster the nonzero elements into a few dense blocks. While this pre-processing can be computationally expensive, we show that the high speed-up provided by the accelerators can easily repay the cost, especially when several multiplications follow one reordering.
Blocking Sparse Matrices to Leverage Dense-Specific Multiplication / Labini, P. S.; Bernaschi, M.; Nutt, W.; Silvestri, F.; Vella, F.. - ELETTRONICO. - (2022), pp. 19-24. (Intervento presentato al convegno 2022 Workshop on Irregular Applications: Architectures and Algorithms, IA3 2022 tenutosi a Dallas, TX, USA nel 13-18 November, 2022) [10.1109/IA356718.2022.00009].
Blocking Sparse Matrices to Leverage Dense-Specific Multiplication
Vella F.
Ultimo
2022-01-01
Abstract
Research to accelerate matrix multiplication, pushed by the growing computational demands of deep learning, has sprouted many efficient architectural solutions, such as NVIDIA's Tensor Cores. These accelerators are designed to process efficiently a high volume of small dense matrix products in parallel. However, it is not obvious how to leverage these accelerators for sparse matrix multiplication. A natural way to adapt the accelerators to this problem is to divide the matrix into small blocks, and then multiply only the nonzero blocks. In this paper, we investigate ways to reorder the rows of a sparse matrix to reduce the number of nonzero blocks and cluster the nonzero elements into a few dense blocks. While this pre-processing can be computationally expensive, we show that the high speed-up provided by the accelerators can easily repay the cost, especially when several multiplications follow one reordering.File | Dimensione | Formato | |
---|---|---|---|
Blocking_Sparse_Matrices_to_Leverage_Dense-Specific_Multiplication.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
847.12 kB
Formato
Adobe PDF
|
847.12 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione