Graphics processing units (GPUs) are increasingly common in both safety-critical and high-performance computing (HPC) applications. Some current supercomputers are composed of thousands of GPUs so the probability of device corruption becomes very high. Moreover, the GPU's parallel capabilities are very attractive for the automotive and aerospace markets, where reliability is a serious concern. In this paper, the neutron sensitivity of the modern GPU caches, and internal resources are experimentally evaluated. Various Duplication With Comparison strategies to reduce GPU radiation sensitivity are then presented and validated through radiation experiments. Threads should be carefully duplicated to avoid undesired errors on shared resources and to avoid the exacerbation of errors in critical resources such as the scheduler.

Modern GPUs Radiation Sensitivity Evaluation and Mitigation through Duplication with Comparison / Oliveira, D. A. G.; Rech, P.; Quinn, H. M.; Fairbanks, T. D.; Monroe, L.; Michalak, S. E.; Anderson-Cook, C.; Navaux, P. O. A.; Carro, L.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 2014, 61:6(2014), pp. 3115-3122. [10.1109/TNS.2014.2362014]

Modern GPUs Radiation Sensitivity Evaluation and Mitigation through Duplication with Comparison

Rech P.;
2014-01-01

Abstract

Graphics processing units (GPUs) are increasingly common in both safety-critical and high-performance computing (HPC) applications. Some current supercomputers are composed of thousands of GPUs so the probability of device corruption becomes very high. Moreover, the GPU's parallel capabilities are very attractive for the automotive and aerospace markets, where reliability is a serious concern. In this paper, the neutron sensitivity of the modern GPU caches, and internal resources are experimentally evaluated. Various Duplication With Comparison strategies to reduce GPU radiation sensitivity are then presented and validated through radiation experiments. Threads should be carefully duplicated to avoid undesired errors on shared resources and to avoid the exacerbation of errors in critical resources such as the scheduler.
2014
6
Oliveira, D. A. G.; Rech, P.; Quinn, H. M.; Fairbanks, T. D.; Monroe, L.; Michalak, S. E.; Anderson-Cook, C.; Navaux, P. O. A.; Carro, L.
Modern GPUs Radiation Sensitivity Evaluation and Mitigation through Duplication with Comparison / Oliveira, D. A. G.; Rech, P.; Quinn, H. M.; Fairbanks, T. D.; Monroe, L.; Michalak, S. E.; Anderson-Cook, C.; Navaux, P. O. A.; Carro, L.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 2014, 61:6(2014), pp. 3115-3122. [10.1109/TNS.2014.2362014]
File in questo prodotto:
File Dimensione Formato  
TNS_Modern_GPUs_Radiation_Sensitivity_Evaluation_and_Mitigation_Through_Duplication_With_Comparison.pdf

Solo gestori archivio

Descrizione: IEEE Transactions on Nuclear Science, Vol.61, No.6, December 2014
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 727.15 kB
Formato Adobe PDF
727.15 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403766
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 38
  • ???jsp.display-item.citation.isi??? 34
  • OpenAlex ND
social impact