While GPUs are being aggressively deployed in a growing number of computing domains, their resilience to transient faults remains a subject of concern. To gain a better understanding of the inherent vulnerability of GPU applications to transient faults, researchers perform extensive fault injection experiments. However, the conclusions reached based on the results of these fault injection experiments tend to be dependent on the specific input used during the experiments. The dependence of program resilience on changes in program input has not been thoroughly studied for GPU workloads. This paper addresses this issue, presenting extensive analysis on the effects of changes in program input and the resulting GPU reliability. Our work extends and challenges previous studies which reported that input data values do not affect reliability. Our analysis demonstrates that input sizes, as well as biased input values (input with a small set of dominant values) can have a significant impact on application reliability. For applications studied, we can expect a change of as much as 30% in the probability for a fault to cause a failure. Furthermore, we provide guidance on how to predict changes in resilience without repeating exhaustive fault injection experiments,

A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications / Previlon, F. G.; Kalra, C.; Kaeli, D. R.; Rech, P.. - (2019), pp. 1-6. ( 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) Noordwijk, Netherlands Oct. 2 2019 to Oct. 4 2019) [10.1109/DFT.2019.8875269].

A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications

Rech P.
2019-01-01

Abstract

While GPUs are being aggressively deployed in a growing number of computing domains, their resilience to transient faults remains a subject of concern. To gain a better understanding of the inherent vulnerability of GPU applications to transient faults, researchers perform extensive fault injection experiments. However, the conclusions reached based on the results of these fault injection experiments tend to be dependent on the specific input used during the experiments. The dependence of program resilience on changes in program input has not been thoroughly studied for GPU workloads. This paper addresses this issue, presenting extensive analysis on the effects of changes in program input and the resulting GPU reliability. Our work extends and challenges previous studies which reported that input data values do not affect reliability. Our analysis demonstrates that input sizes, as well as biased input values (input with a small set of dominant values) can have a significant impact on application reliability. For applications studied, we can expect a change of as much as 30% in the probability for a fault to cause a failure. Furthermore, we provide guidance on how to predict changes in resilience without repeating exhaustive fault injection experiments,
2019
IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems, DFT 2019
Los Alamitos, CA, USA
IEEE Computer Society - Digital Library
978-1-7281-2260-1
Previlon, F. G.; Kalra, C.; Kaeli, D. R.; Rech, P.
A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications / Previlon, F. G.; Kalra, C.; Kaeli, D. R.; Rech, P.. - (2019), pp. 1-6. ( 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) Noordwijk, Netherlands Oct. 2 2019 to Oct. 4 2019) [10.1109/DFT.2019.8875269].
File in questo prodotto:
File Dimensione Formato  
A_Comprehensive_Evaluation_of_the_Effects_of_Input_Data_on_the_Resilience_of_GPU_Applications.pdf

accesso aperto

Descrizione: IEEE Computer Society proceedings paper
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 241.8 kB
Formato Adobe PDF
241.8 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403739
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 11
  • ???jsp.display-item.citation.isi??? 9
  • OpenAlex ND
social impact