Characterizing a Neutron-Induced Fault Model for Deep Neural Networks

IRIS

The reliability evaluation of deep neural networks (DNNs) executed on graphic processing units (GPUs) is a challenging problem, since the hardware architecture is highly complex, and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results, since it has limited access to the hardware resources, and the adopted fault models may be too naive (i.e., single and double bit flips). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This article proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at the software level. We exposed GPUs running general matrix multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61% and show that the error correction code (ECC) is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from register-transfer level (RTL) simulations. Our results show that by injecting complex fault models, the version 3 of you only look once (YOLOv3) misdetection rate is validated to be very close to the rate measured with beam experiments, which is $8.66\times $ higher than the one measured with fault injection using only single-bit flips.

Characterizing a Neutron-Induced Fault Model for Deep Neural Networks / Santos, Fernando Fernandes dos; Kritikakou, Angeliki; Condia, Josie E. Rodriguez; Guerrero-Balaguera, Juan-David; Reorda, Matteo Sonza; Sentieys, Olivier; Rech, Paolo. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 70:4(2023), pp. 370-380. [10.1109/tns.2022.3224538]

Characterizing a Neutron-Induced Fault Model for Deep Neural Networks

Santos, Fernando Fernandes dos^Primo;Kritikakou, Angeliki^Secondo;Condia, Josie E. Rodriguez;Guerrero-Balaguera, Juan-David;Reorda, Matteo Sonza;Sentieys, Olivier^Penultimo;Rech, Paolo^Ultimo

2023-01-01

Abstract

The reliability evaluation of deep neural networks (DNNs) executed on graphic processing units (GPUs) is a challenging problem, since the hardware architecture is highly complex, and the software frameworks are composed of many layers of abstraction. While software-level fault injection is a common and fast way to evaluate the reliability of complex applications, it may produce unrealistic results, since it has limited access to the hardware resources, and the adopted fault models may be too naive (i.e., single and double bit flips). Contrarily, physical fault injection with neutron beam provides realistic error rates but lacks fault propagation visibility. This article proposes a characterization of the DNN fault model combining both neutron beam experiments and fault injection at the software level. We exposed GPUs running general matrix multiplication (GEMM) and DNNs to beam neutrons to measure their error rate. On DNNs, we observe that the percentage of critical errors can be up to 61% and show that the error correction code (ECC) is ineffective in reducing critical errors. We then performed a complementary software-level fault injection, using fault models derived from register-transfer level (RTL) simulations. Our results show that by injecting complex fault models, the version 3 of you only look once (YOLOv3) misdetection rate is validated to be very close to the rate measured with beam experiments, which is $8.66\times $ higher than the one measured with fault injection using only single-bit flips.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del periodico (Journal title)
	
				IEEE TRANSACTIONS ON NUCLEAR SCIENCE
			
	Numero e parte del fascicolo (Issue number and part)
	
				4
			
	DOI
	
				https://dx.doi.org/10.1109/tns.2022.3224538
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85144063600
			
	Codice WOS (WOS identifier)
	
				WOS:000975399300012
			
	Tutti gli autori
	
						Santos, Fernando Fernandes dos; Kritikakou, Angeliki; Condia, Josie E. Rodriguez; Guerrero-Balaguera, Juan-David; Reorda, Matteo Sonza; Sentieys, Oliv...espandi
						
	Citazione
	
				Characterizing a Neutron-Induced Fault Model for Deep Neural Networks / Santos, Fernando Fernandes dos; Kritikakou, Angeliki; Condia, Josie E. Rodriguez; Guerrero-Balaguera, Juan-David; Reorda, Matteo Sonza; Sentieys, Olivier; Rech, Paolo. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 70:4(2023), pp. 370-380. [10.1109/tns.2022.3224538]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
2211.13094.pdf accesso aperto Tipologia: Pre-print non referato (Non-refereed preprint) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 840.14 kB Formato Adobe PDF Visualizza/Apri	840.14 kB	Adobe PDF	Visualizza/Apri
Characterizing_a_Neutron-Induced_Fault_Model_for_Deep_Neural_Networks.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.44 MB Formato Adobe PDF Visualizza/Apri	1.44 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403697

Citazioni

ND

9

4

ND

social impact