The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register- Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility.
Revealing GPUs Vulnerabilities by Combining Register-Transfer and Software-Level Fault Injection / dos Santos, Fernando F.; Condia, Josie E. Rodriguez; Carro, Luigi; Reorda, Matteo Sonza; Rech, Paolo. - (2021), pp. 292-304. (Intervento presentato al convegno DSN 2021 tenutosi a Virtual Event nel 21st-24th June 2021) [10.1109/DSN48987.2021.00042].
Revealing GPUs Vulnerabilities by Combining Register-Transfer and Software-Level Fault Injection
Rech, PaoloUltimo
2021-01-01
Abstract
The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register- Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility.File | Dimensione | Formato | |
---|---|---|---|
DSN_Revealing_GPUs_Vulnerabilities_by_Combining_Register-Transfer_and_Software-Level_Fault_Injection.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.34 MB
Formato
Adobe PDF
|
1.34 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione