Transient faults continue to be a critical concern in a range of computing domains including: High-Performance Computing (HPC), scientific computing, and the automotive industry. While radiation-induced faults have been well studied and understood in microprocessors, their impact on computations on Graphic Processing Units (GPU) has received less attention. GPUs are now being used in a large number of HPC and automotive markets. Mitigating the effects of transient faults requires a thorough understanding of the interaction between applications, system software, and the underlying hardware. Developing this understanding is quite challenging mainly due to our limited ability to capture and study cross-layer reliability interactions. In this paper, we consider the combination of neutron beam testing experiments with architectural fault injection experiments to gain a deeper understanding of the relationship between the vulnerability of GPUs and the underlying workload characteristics of applications targeted for GPU devices.

Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience / Previlon, F. G.; Egbantan, B.; Tiwari, D.; Rech, P.; Kaeli, D. R.. - 2017-:(2017), pp. 898-901. (Intervento presentato al convegno 60th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS 2017 tenutosi a usa nel 2017) [10.1109/MWSCAS.2017.8053069].

Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience

Rech P.;
2017-01-01

Abstract

Transient faults continue to be a critical concern in a range of computing domains including: High-Performance Computing (HPC), scientific computing, and the automotive industry. While radiation-induced faults have been well studied and understood in microprocessors, their impact on computations on Graphic Processing Units (GPU) has received less attention. GPUs are now being used in a large number of HPC and automotive markets. Mitigating the effects of transient faults requires a thorough understanding of the interaction between applications, system software, and the underlying hardware. Developing this understanding is quite challenging mainly due to our limited ability to capture and study cross-layer reliability interactions. In this paper, we consider the combination of neutron beam testing experiments with architectural fault injection experiments to gain a deeper understanding of the relationship between the vulnerability of GPUs and the underlying workload characteristics of applications targeted for GPU devices.
2017
Midwest Symposium on Circuits and Systems
usa
Institute of Electrical and Electronics Engineers Inc.
978-1-5090-6389-5
Previlon, F. G.; Egbantan, B.; Tiwari, D.; Rech, P.; Kaeli, D. R.
Combining architectural fault-injection and neutron beam testing approaches toward better understanding of GPU soft-error resilience / Previlon, F. G.; Egbantan, B.; Tiwari, D.; Rech, P.; Kaeli, D. R.. - 2017-:(2017), pp. 898-901. (Intervento presentato al convegno 60th IEEE International Midwest Symposium on Circuits and Systems, MWSCAS 2017 tenutosi a usa nel 2017) [10.1109/MWSCAS.2017.8053069].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346635
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
  • OpenAlex ND
social impact