Duplication with Comparison (DWC) is an effective software-level solution to improve the reliability of computing devices. However, it introduces performance and energy consumption overheads that could be unsuitable for high-performance computing or real-time safety-critical applications. In this article, we present Reduced-Precision Duplication with Comparison (RP-DWC) as a means to lower the overhead of DWC by executing the redundant copy in reduced precision. RP-DWC is particularly suitable for modern mixed-precision architectures, such as NVIDIA GPUs, that feature dedicated functional units for computing with programmable accuracy. We discuss the benefits and challenges associated with RP-DWC and show that the intrinsic difference between the mixed-precision copies allows for detecting most, but not all, errors. However, as the undetected faults are the ones that fall into the difference between precisions, they are the ones that produce a much smaller impact on the application output and, thus, might be tolerated. We investigate RP-DWC impact into fault detection, performance, and energy consumption on Volta GPUs. Through fault injection and beam experiment, using three microbenchmarks and four real applications, we show that RP-DWC achieves an excellent coverage (up to 86 percent) with minimal overheads (as low as 0.1 percent time and 24 percent energy consumption overhead).

Reduced Precision DWC: An Efficient Hardening Strategy for Mixed-Precision Architectures / dos Santos, Fernando; Brandalero, Marcelo; Sullivan, Michael B.; Basso, Pedro M.; Hübner, Michael; Carro, Luigi; Rech, Paolo. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 71:3(2022), pp. 573-586. [10.1109/TC.2021.3058872]

Reduced Precision DWC: An Efficient Hardening Strategy for Mixed-Precision Architectures

Rech, Paolo
Ultimo
2022-01-01

Abstract

Duplication with Comparison (DWC) is an effective software-level solution to improve the reliability of computing devices. However, it introduces performance and energy consumption overheads that could be unsuitable for high-performance computing or real-time safety-critical applications. In this article, we present Reduced-Precision Duplication with Comparison (RP-DWC) as a means to lower the overhead of DWC by executing the redundant copy in reduced precision. RP-DWC is particularly suitable for modern mixed-precision architectures, such as NVIDIA GPUs, that feature dedicated functional units for computing with programmable accuracy. We discuss the benefits and challenges associated with RP-DWC and show that the intrinsic difference between the mixed-precision copies allows for detecting most, but not all, errors. However, as the undetected faults are the ones that fall into the difference between precisions, they are the ones that produce a much smaller impact on the application output and, thus, might be tolerated. We investigate RP-DWC impact into fault detection, performance, and energy consumption on Volta GPUs. Through fault injection and beam experiment, using three microbenchmarks and four real applications, we show that RP-DWC achieves an excellent coverage (up to 86 percent) with minimal overheads (as low as 0.1 percent time and 24 percent energy consumption overhead).
2022
3
dos Santos, Fernando; Brandalero, Marcelo; Sullivan, Michael B.; Basso, Pedro M.; Hübner, Michael; Carro, Luigi; Rech, Paolo
Reduced Precision DWC: An Efficient Hardening Strategy for Mixed-Precision Architectures / dos Santos, Fernando; Brandalero, Marcelo; Sullivan, Michael B.; Basso, Pedro M.; Hübner, Michael; Carro, Luigi; Rech, Paolo. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 71:3(2022), pp. 573-586. [10.1109/TC.2021.3058872]
File in questo prodotto:
File Dimensione Formato  
TC_Reduced_Precision_DWC_An_Efficient_Hardening_Strategy_for_Mixed-Precision_Architectures.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.32 MB
Formato Adobe PDF
2.32 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346699
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 10
social impact