While GPUs are being aggressively deployed in a growing number of computing domains, their resilience to transient faults remains a subject of concern. To gain a better understanding of the inherent vulnerability of GPU applications to transient faults, researchers perform extensive fault injection experiments. However, the conclusions reached based on the results of these fault injection experiments tend to be dependent on the specific input used during the experiments. The dependence of program resilience on changes in program input has not been thoroughly studied for GPU workloads. This paper addresses this issue, presenting extensive analysis on the effects of changes in program input and the resulting GPU reliability. Our work extends and challenges previous studies which reported that input data values do not affect reliability. Our analysis demonstrates that input sizes, as well as biased input values (input with a small set of dominant values) can have a significant impact on application reliability. For applications studied, we can expect a change of as much as 30% in the probability for a fault to cause a failure. Furthermore, we provide guidance on how to predict changes in resilience without repeating exhaustive fault injection experiments,
A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications / Previlon, F. G.; Kalra, C.; Kaeli, D. R.; Rech, P.. - (2019), pp. 1-6. ( 2019 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT) Noordwijk, Netherlands Oct. 2 2019 to Oct. 4 2019) [10.1109/DFT.2019.8875269].
A Comprehensive Evaluation of the Effects of Input Data on the Resilience of GPU Applications
Rech P.
2019-01-01
Abstract
While GPUs are being aggressively deployed in a growing number of computing domains, their resilience to transient faults remains a subject of concern. To gain a better understanding of the inherent vulnerability of GPU applications to transient faults, researchers perform extensive fault injection experiments. However, the conclusions reached based on the results of these fault injection experiments tend to be dependent on the specific input used during the experiments. The dependence of program resilience on changes in program input has not been thoroughly studied for GPU workloads. This paper addresses this issue, presenting extensive analysis on the effects of changes in program input and the resulting GPU reliability. Our work extends and challenges previous studies which reported that input data values do not affect reliability. Our analysis demonstrates that input sizes, as well as biased input values (input with a small set of dominant values) can have a significant impact on application reliability. For applications studied, we can expect a change of as much as 30% in the probability for a fault to cause a failure. Furthermore, we provide guidance on how to predict changes in resilience without repeating exhaustive fault injection experiments,| File | Dimensione | Formato | |
|---|---|---|---|
|
A_Comprehensive_Evaluation_of_the_Effects_of_Input_Data_on_the_Resilience_of_GPU_Applications.pdf
accesso aperto
Descrizione: IEEE Computer Society proceedings paper
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
241.8 kB
Formato
Adobe PDF
|
241.8 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



