Using error-correcting codes (ECCs) is considered one of the most effective ways to mask the effects of radiation-induced faults in memory and computing devices. Unfortunately, with the increased complexity of modern processors, there is a growing amount of hidden logic and memory resources, such as flip-flops in internal pipelines and queues, that cannot be easily protected by ECC. In this paper, we experimentally investigate the efficacy of using ECC to mask neutron-induced faults in modern graphics processing units (GPUs). In our analysis, we consider GPUs fabricated in CMOS and FinFET technologies. We show that changes in transistor technology can be as beneficial as using ECC for reducing silent data corruption rates. Finally, we compare fault-injection results, as carried out both on internal registers and at an instruction level, to better understand the effectiveness of ECC.

On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability / Lunardi, C.; Previlon, F.; Kaeli, D.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 65:8(2018), pp. 1843-1850. [10.1109/TNS.2018.2823786]

On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability

Rech P.
2018-01-01

Abstract

Using error-correcting codes (ECCs) is considered one of the most effective ways to mask the effects of radiation-induced faults in memory and computing devices. Unfortunately, with the increased complexity of modern processors, there is a growing amount of hidden logic and memory resources, such as flip-flops in internal pipelines and queues, that cannot be easily protected by ECC. In this paper, we experimentally investigate the efficacy of using ECC to mask neutron-induced faults in modern graphics processing units (GPUs). In our analysis, we consider GPUs fabricated in CMOS and FinFET technologies. We show that changes in transistor technology can be as beneficial as using ECC for reducing silent data corruption rates. Finally, we compare fault-injection results, as carried out both on internal registers and at an instruction level, to better understand the effectiveness of ECC.
8
Lunardi, C.; Previlon, F.; Kaeli, D.; Rech, P.
On the efficacy of ECC and the benefits of FinFET transistor layout for GPU reliability / Lunardi, C.; Previlon, F.; Kaeli, D.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 65:8(2018), pp. 1843-1850. [10.1109/TNS.2018.2823786]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346637
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? ND
social impact