Deep learning tasks cover a broad range of domains and an even more extensive range of applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural Network (DNN) algorithms are implemented on different systems, from small embedded devices to data centers. DNN accelerators have proven to be a key to efficiency, as they are even more efficient than CPUs. Therefore, they have become the major executing hardware for DNN algorithms. However, these accelerators are susceptible to several types of faults. Soft errors pose a particular threat because the high-level parallelism in these accelerators can propagate a single failure to multiple errors in the next levels until the model predictions' output is affected. This article presents a comprehensive review of the reliability of the DNN accelerators. The study begins by reviewing the widely assumed claim that DNNs are inherently tolerant to faults. Then, the available DNN accelerators are systematically classified into several categories. Each is individually analyzed; and the commonly used accelerators are compared in an attempt to answer the question, which accelerator is more reliable against transient faults? The concluding part of this review highlights the gray areas of the DNNs and predicts future research directions that will enhance its applicability. This study is expected to benefit researchers in the areas of deep learning, DNN accelerators, and reliability of this efficient paradigm.
Soft errors in DNN accelerators: A comprehensive review / Ibrahim, Y.; Wang, H.; Liu, J.; Wei, J.; Chen, L.; Rech, P.; Adam, K.; Guo, G.. - In: MICROELECTRONICS RELIABILITY. - ISSN 0026-2714. - 115:(2020), p. 113969. [10.1016/j.microrel.2020.113969]
Soft errors in DNN accelerators: A comprehensive review
Chen L.;Rech P.;
2020-01-01
Abstract
Deep learning tasks cover a broad range of domains and an even more extensive range of applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural Network (DNN) algorithms are implemented on different systems, from small embedded devices to data centers. DNN accelerators have proven to be a key to efficiency, as they are even more efficient than CPUs. Therefore, they have become the major executing hardware for DNN algorithms. However, these accelerators are susceptible to several types of faults. Soft errors pose a particular threat because the high-level parallelism in these accelerators can propagate a single failure to multiple errors in the next levels until the model predictions' output is affected. This article presents a comprehensive review of the reliability of the DNN accelerators. The study begins by reviewing the widely assumed claim that DNNs are inherently tolerant to faults. Then, the available DNN accelerators are systematically classified into several categories. Each is individually analyzed; and the commonly used accelerators are compared in an attempt to answer the question, which accelerator is more reliable against transient faults? The concluding part of this review highlights the gray areas of the DNNs and predicts future research directions that will enhance its applicability. This study is expected to benefit researchers in the areas of deep learning, DNN accelerators, and reliability of this efficient paradigm.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione