Assessing the Impact of Compiler Optimizations on GPUs Reliability

IRIS

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex optimizations to improve performance. These optimizations modify how the software is mapped in the underlying hardware; thus, as we show in this article, they can also affect GPU reliability. We evaluate the effects on the GPU error rate of the optimization flags applied at the NVCC Parallel Thread Execution (PTX) compiling phase by analyzing two NVIDIA GPU architectures (Kepler and Volta) and two compiler versions (NVCC 10.2 and 11.3). We compare and combine fault propagation analysis based on software fault injection, hardware utilization distribution obtained with application-level profiling, and machine instructions radiation-induced error rate measured with beam experiments. We consider eight different workloads and 144 combinations of compilation flags, and we show that optimizations can impact the GPUs’ error rate of up to an order of magnitude. Additionally, through accelerated neutron beam experiments on a NVIDIA Kepler GPU, we show that the error rate of the unoptimized GEMM (-O0 flag) is lower than the optimized GEMM’s (-O3 flag) error rate. When the performance is evaluated together with the error rate, we show that the most optimized versions (-O1 and -O3) always produce a higher amount of correct data than the unoptimized code (-O0).

Assessing the Impact of Compiler Optimizations on GPUs Reliability / Santos, Fernando Fernandes Dos; Carro, Luigi; Vella, Flavio; Rech, Paolo. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - 21:2(2024), pp. 2601-2622. [10.1145/3638249]

Assessing the Impact of Compiler Optimizations on GPUs Reliability

Santos, Fernando Fernandes Dos^Primo;Carro, Luigi^Secondo;Vella, Flavio^Penultimo;Rech, Paolo^Ultimo

2024-01-01

Abstract

Graphics Processing Units (GPUs) compilers have evolved in order to support general-purpose programming languages for multiple architectures. NVIDIA CUDA Compiler (NVCC) has many compilation levels before generating the machine code and applies complex optimizations to improve performance. These optimizations modify how the software is mapped in the underlying hardware; thus, as we show in this article, they can also affect GPU reliability. We evaluate the effects on the GPU error rate of the optimization flags applied at the NVCC Parallel Thread Execution (PTX) compiling phase by analyzing two NVIDIA GPU architectures (Kepler and Volta) and two compiler versions (NVCC 10.2 and 11.3). We compare and combine fault propagation analysis based on software fault injection, hardware utilization distribution obtained with application-level profiling, and machine instructions radiation-induced error rate measured with beam experiments. We consider eight different workloads and 144 combinations of compilation flags, and we show that optimizations can impact the GPUs’ error rate of up to an order of magnitude. Additionally, through accelerated neutron beam experiments on a NVIDIA Kepler GPU, we show that the error rate of the unoptimized GEMM (-O0 flag) is lower than the optimized GEMM’s (-O3 flag) error rate. When the performance is evaluated together with the error rate, we show that the most optimized versions (-O1 and -O3) always produce a higher amount of correct data than the unoptimized code (-O0).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del periodico (Journal title)
	
				ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION
			
	Numero e parte del fascicolo (Issue number and part)
	
				2
			
	DOI
	
				https://dx.doi.org/10.1145/3638249
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85192626836
			
	Codice WOS (WOS identifier)
	
				WOS:001242588100006
			
	Tutti gli autori
	
						Santos, Fernando Fernandes Dos; Carro, Luigi; Vella, Flavio; Rech, Paolo
					
	Citazione
	
				Assessing the Impact of Compiler Optimizations on GPUs Reliability / Santos, Fernando Fernandes Dos; Carro, Luigi; Vella, Flavio; Rech, Paolo. - In: ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION. - ISSN 1544-3566. - 21:2(2024), pp. 2601-2622. [10.1145/3638249]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
TACO24.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 4.37 MB Formato Adobe PDF Visualizza/Apri	4.37 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403702

Citazioni

ND

7

5

11

social impact