EdgeAI is an emerging artificial intelligence (AI) accelerator technology, which is capable of delivering improved AI performance at both a lower cost and a lower power level. With the aim of implementation in large quantities and in safety-critical environments, it is imperative to understand how single-event effects (SEEs) affect the reliability of this new family of devices and to propose efficient hardening solutions. Through neutron beam experiments and fault-injection analysis of a commercial-off-the-shelf (COTS) EdgeAI device, we are able to identify the device's SEE failure-modes, separate the error rate contributions of the device's different resources, and characterize the device's SEE reliability. During this analysis, we discovered that the vast majority of single-bit flips have no appreciable effect on the output. After this analysis, we propose a hardening solution that implements triple-modular redundancy (TMR) in the device without changing its physical architecture. We experimentally validate this solution and show that we are able to correct 96% of the misclassifications (critical errors) with nearly zero overhead.

Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators / Blower, Sebastian; Rech, Paolo; Cazzaniga, Carlo; Kastriotou, Maria; Frost, Christopher D.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 68:8(2021), pp. 1719-1726. [10.1109/TNS.2021.3086686]

Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators

Rech, Paolo
Secondo
;
2021-01-01

Abstract

EdgeAI is an emerging artificial intelligence (AI) accelerator technology, which is capable of delivering improved AI performance at both a lower cost and a lower power level. With the aim of implementation in large quantities and in safety-critical environments, it is imperative to understand how single-event effects (SEEs) affect the reliability of this new family of devices and to propose efficient hardening solutions. Through neutron beam experiments and fault-injection analysis of a commercial-off-the-shelf (COTS) EdgeAI device, we are able to identify the device's SEE failure-modes, separate the error rate contributions of the device's different resources, and characterize the device's SEE reliability. During this analysis, we discovered that the vast majority of single-bit flips have no appreciable effect on the output. After this analysis, we propose a hardening solution that implements triple-modular redundancy (TMR) in the device without changing its physical architecture. We experimentally validate this solution and show that we are able to correct 96% of the misclassifications (critical errors) with nearly zero overhead.
2021
8
Blower, Sebastian; Rech, Paolo; Cazzaniga, Carlo; Kastriotou, Maria; Frost, Christopher D.
Evaluating and Mitigating Neutrons Effects on COTS EdgeAI Accelerators / Blower, Sebastian; Rech, Paolo; Cazzaniga, Carlo; Kastriotou, Maria; Frost, Christopher D.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 68:8(2021), pp. 1719-1726. [10.1109/TNS.2021.3086686]
File in questo prodotto:
File Dimensione Formato  
TNS_Evaluating_and_Mitigating_Neutrons_Effects_on_COTS_EdgeAI_Accelerators.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.08 MB
Formato Adobe PDF
1.08 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346721
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 18
  • OpenAlex ND
social impact