We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely sources of DUEs. Error-correcting code (ECC) increases the launch failure events. Our test procedure has shown that ECC can reduce the DUEs caused by Illegal Address access up to 92% for Kepler and up to 98% for Volta. In addition, we analyze whether the compiler optimizations can impact the DUE sources distribution for the matrix multiplication. We found that the machine codes generated by the different optimization levels can change the DUE source by no more than 24% on average.

Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs / dos Santos, Fernando Fernandes; Malde, Sujit; Cazzaniga, Carlo; Frost, Christopher; Carro, Luigi; Rech, Paolo. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 69:3(2022), pp. 436-443. [10.1109/TNS.2022.3141341]

Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs

Rech, Paolo
Ultimo
2022-01-01

Abstract

We investigate the sources of detected unrecoverable errors (DUEs) in graphics processing units (GPUs) exposed to a neutron beam. Illegal memory accesses and interface errors are among the more likely sources of DUEs. Error-correcting code (ECC) increases the launch failure events. Our test procedure has shown that ECC can reduce the DUEs caused by Illegal Address access up to 92% for Kepler and up to 98% for Volta. In addition, we analyze whether the compiler optimizations can impact the DUE sources distribution for the matrix multiplication. We found that the machine codes generated by the different optimization levels can change the DUE source by no more than 24% on average.
2022
3
dos Santos, Fernando Fernandes; Malde, Sujit; Cazzaniga, Carlo; Frost, Christopher; Carro, Luigi; Rech, Paolo
Experimental Findings on the Sources of Detected Unrecoverable Errors in GPUs / dos Santos, Fernando Fernandes; Malde, Sujit; Cazzaniga, Carlo; Frost, Christopher; Carro, Luigi; Rech, Paolo. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 69:3(2022), pp. 436-443. [10.1109/TNS.2022.3141341]
File in questo prodotto:
File Dimensione Formato  
TNS_Experimental_Findings_on_the_Sources_of_Detected_Unrecoverable_Errors_in_GPUs.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 827.96 kB
Formato Adobe PDF
827.96 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346651
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
  • OpenAlex ND
social impact