Soft errors in DNN accelerators: A comprehensive review

IRIS

Deep learning tasks cover a broad range of domains and an even more extensive range of applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural Network (DNN) algorithms are implemented on different systems, from small embedded devices to data centers. DNN accelerators have proven to be a key to efficiency, as they are even more efficient than CPUs. Therefore, they have become the major executing hardware for DNN algorithms. However, these accelerators are susceptible to several types of faults. Soft errors pose a particular threat because the high-level parallelism in these accelerators can propagate a single failure to multiple errors in the next levels until the model predictions' output is affected. This article presents a comprehensive review of the reliability of the DNN accelerators. The study begins by reviewing the widely assumed claim that DNNs are inherently tolerant to faults. Then, the available DNN accelerators are systematically classified into several categories. Each is individually analyzed; and the commonly used accelerators are compared in an attempt to answer the question, which accelerator is more reliable against transient faults? The concluding part of this review highlights the gray areas of the DNNs and predicts future research directions that will enhance its applicability. This study is expected to benefit researchers in the areas of deep learning, DNN accelerators, and reliability of this efficient paradigm.

Soft errors in DNN accelerators: A comprehensive review / Ibrahim, Y.; Wang, H.; Liu, J.; Wei, J.; Chen, L.; Rech, P.; Adam, K.; Guo, G.. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - 115:(2020), p. 113969. [10.1016/j.microrel.2020.113969]

Soft errors in DNN accelerators: A comprehensive review

Ibrahim Y.;Wang H.;Liu J.;Wei J.;Chen L.;Rech P.;Adam K.;Guo G.

2020-01-01

Abstract

Deep learning tasks cover a broad range of domains and an even more extensive range of applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural Network (DNN) algorithms are implemented on different systems, from small embedded devices to data centers. DNN accelerators have proven to be a key to efficiency, as they are even more efficient than CPUs. Therefore, they have become the major executing hardware for DNN algorithms. However, these accelerators are susceptible to several types of faults. Soft errors pose a particular threat because the high-level parallelism in these accelerators can propagate a single failure to multiple errors in the next levels until the model predictions' output is affected. This article presents a comprehensive review of the reliability of the DNN accelerators. The study begins by reviewing the widely assumed claim that DNNs are inherently tolerant to faults. Then, the available DNN accelerators are systematically classified into several categories. Each is individually analyzed; and the commonly used accelerators are compared in an attempt to answer the question, which accelerator is more reliable against transient faults? The concluding part of this review highlights the gray areas of the DNNs and predicts future research directions that will enhance its applicability. This study is expected to benefit researchers in the areas of deep learning, DNN accelerators, and reliability of this efficient paradigm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2020
			
	Autore/i dell'opera recensita (Author/s of the reviewed work)
	
				n/a
			
	Titolo del periodico (Journal title)
	
				MICROELECTRONICS RELIABILITY
			
	DOI
	
				https://dx.doi.org/10.1016/j.microrel.2020.113969
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85094315433
			
	Codice WOS (WOS identifier)
	
				WOS:000589585100008
			
	Tutti gli autori
	
						Ibrahim, Y.; Wang, H.; Liu, J.; Wei, J.; Chen, L.; Rech, P.; Adam, K.; Guo, G.
					
	Citazione
	
				Soft errors in DNN accelerators: A comprehensive review / Ibrahim, Y.; Wang, H.; Liu, J.; Wei, J.; Chen, L.; Rech, P.; Adam, K.; Guo, G.. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - 115:(2020), p. 113969. [10.1016/j.microrel.2020.113969]
			
	Appare nelle tipologie:
	
				03.2 Recensione (Book or article review)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346723

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

81

59

ND

social impact