On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and beyond

IRIS

Efficient HPC libraries often expose multiple tunable parameters, algorithmic implementations, or a combination of them, to provide optimized routines. The optimal parameters and algorithmic choices may depend on input properties such as the shapes of the matrices involved in the operation. Traditionally, these parameters are manually tuned or set by auto-tuners. In emerging applications such as deep learning, this approach is not effective across the wide range of inputs and architectures used in practice. In this work, we analyze different machine learning techniques and predictive models to accelerate the convolution operator and GEMM. Moreover, we address the problem of dataset generation, and we study the performance, accuracy, and generalization ability of the models. Our insights allow us to improve the performance of computationally expensive deep learning primitives on high-end GPUs as well as low-power embedded GPU architectures on three different libraries. Experimental results show significant improvement in the target applications from 50% up to 300% compared to auto-tuned and high-optimized vendor-based heuristics by using simple decision tree- and MLP-based models.

On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and beyond / Sylos Labini, P.; Cianfriglia, M.; Perri, D.; Gervasi, O.; Fursin, G.; Lokhmotov, A.; Nugteren, C.; Carpentieri, B.; Zollo, F.; Vella, F.. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - ELETTRONICO. - 18:1(2021), pp. 1-24. [10.1145/3434402]

On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and beyond

Sylos Labini P.;Cianfriglia M.;Perri D.;Gervasi O.;Fursin G.;Lokhmotov A.;Nugteren C.;Carpentieri B.;Zollo F.;Vella F.

2021-01-01

Abstract

Efficient HPC libraries often expose multiple tunable parameters, algorithmic implementations, or a combination of them, to provide optimized routines. The optimal parameters and algorithmic choices may depend on input properties such as the shapes of the matrices involved in the operation. Traditionally, these parameters are manually tuned or set by auto-tuners. In emerging applications such as deep learning, this approach is not effective across the wide range of inputs and architectures used in practice. In this work, we analyze different machine learning techniques and predictive models to accelerate the convolution operator and GEMM. Moreover, we address the problem of dataset generation, and we study the performance, accuracy, and generalization ability of the models. Our insights allow us to improve the performance of computationally expensive deep learning primitives on high-end GPUs as well as low-power embedded GPU architectures on three different libraries. Experimental results show significant improvement in the target applications from 50% up to 300% compared to auto-tuned and high-optimized vendor-based heuristics by using simple decision tree- and MLP-based models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2021
			
	Titolo del periodico (Journal title)
	
				ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
			
	Numero e parte del fascicolo (Issue number and part)
	
				1
			
	DOI
	
				https://dx.doi.org/10.1145/3434402
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85099790155
			
	Codice WOS (WOS identifier)
	
				WOS:000612575500016
			
	Tutti gli autori
	
						Sylos Labini, P.; Cianfriglia, M.; Perri, D.; Gervasi, O.; Fursin, G.; Lokhmotov, A.; Nugteren, C.; Carpentieri, B.; Zollo, F.; Vella, F.
					
	Citazione
	
				On the Anatomy of Predictive Models for Accelerating GPU Convolution Kernels and beyond / Sylos Labini, P.; Cianfriglia, M.; Perri, D.; Gervasi, O.; Fursin, G.; Lokhmotov, A.; Nugteren, C.; Carpentieri, B.; Zollo, F.; Vella, F.. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - ELETTRONICO. - 18:1(2021), pp. 1-24. [10.1145/3434402]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
3434402.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.03 MB Formato Adobe PDF Visualizza/Apri	3.03 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/332635

Citazioni

ND

10

10

ND

social impact