EdgeAI is an emerging artificial intelligence (AI) accelerator technology, which is capable of delivering improved AI performance at both a lower cost and a lower power level. With the aim of implementation in large quantities and in safety-critical environments, it is imperative to understand how single-event effects (SEEs) affect the reliability of this new family of devices and to propose efficient hardening solutions. Through neutron beam experiments and fault-injection analysis of a commercial-off-the-shelf (COTS) EdgeAI device, we are able to identify the device's SEE failure-modes, separate the error rate contributions of the device's different resources, and characterize the device's SEE reliability. During this analysis, we discovered that the vast majority of single-bit flips have no appreciable effect on the output. After this analysis, we propose a hardening solution that implements triple-modular redundancy (TMR) in the device without changing its physical architecture. We experimentally validate this solution and show that we are able to correct 96% of the misclassifications (critical errors) with nearly zero overhead.

Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators / Blower, S.; Rech, P.; Cazzaniga, C.; Kastriotou, M.; Frost, C. D.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 68:8(2021), pp. 1719-1726. [10.1109/TNS.2021.3086686]

Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators

Rech P.;
2021-01-01

Abstract

EdgeAI is an emerging artificial intelligence (AI) accelerator technology, which is capable of delivering improved AI performance at both a lower cost and a lower power level. With the aim of implementation in large quantities and in safety-critical environments, it is imperative to understand how single-event effects (SEEs) affect the reliability of this new family of devices and to propose efficient hardening solutions. Through neutron beam experiments and fault-injection analysis of a commercial-off-the-shelf (COTS) EdgeAI device, we are able to identify the device's SEE failure-modes, separate the error rate contributions of the device's different resources, and characterize the device's SEE reliability. During this analysis, we discovered that the vast majority of single-bit flips have no appreciable effect on the output. After this analysis, we propose a hardening solution that implements triple-modular redundancy (TMR) in the device without changing its physical architecture. We experimentally validate this solution and show that we are able to correct 96% of the misclassifications (critical errors) with nearly zero overhead.
2021
8
Blower, S.; Rech, P.; Cazzaniga, C.; Kastriotou, M.; Frost, C. D.
Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators / Blower, S.; Rech, P.; Cazzaniga, C.; Kastriotou, M.; Frost, C. D.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 68:8(2021), pp. 1719-1726. [10.1109/TNS.2021.3086686]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346721
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact