Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs

IRIS

Graphics processing units (GPUs) are playing a critical role in convolutional neural networks (CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments, reliability is becoming a growing concern. In this paper, we evaluate and propose strategies to improve the reliability of object detection algorithms, as run on three NVIDIA GPU architectures. We consider three algorithms: 1) you only look once; 2) a faster region-based CNN (Faster R-CNN); and 3) a residual network, exposing live hardware to neutron beams. We complement our beam experiments with fault injection to better characterize fault propagation in CNNs. We show that a single fault occurring in a GPU tends to propagate to multiple active threads, significantly reducing the reliability of a CNN. Moreover, relying on error correcting codes dramatically reduces the number of silent data corruptions (SDCs), but does not reduce the number of critical errors (i.e., errors that could potentially impact safety-critical applications). Based on observations on how faults propagate on GPU architectures, we propose effective strategies to improve CNN reliability. We also consider the benefits of using an algorithm-based fault-tolerance technique for matrix multiplication, which can correct more than 87% of the critical SDCs in a CNN, while redesigning maxpool layers of the CNN to detect up to 98% of critical SDCs.

Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs / Santos, F.F.D., Pimenta, P.F., Lunardi, C., Draghetti, L., Carro, L., Kaeli, D., Rech, P.. - In: IEEE TRANSACTIONS ON RELIABILITY. - ISSN 0018-9529. - 2019, 68:2(2019), pp. 663-677. [10.1109/TR.2018.2878387]

Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs

Santos F. F. D.;Pimenta P. F.;Lunardi C.;Draghetti L.;Carro L.;Kaeli D.;Rech P.^Ultimo

2019-01-01

Abstract

Graphics processing units (GPUs) are playing a critical role in convolutional neural networks (CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments, reliability is becoming a growing concern. In this paper, we evaluate and propose strategies to improve the reliability of object detection algorithms, as run on three NVIDIA GPU architectures. We consider three algorithms: 1) you only look once; 2) a faster region-based CNN (Faster R-CNN); and 3) a residual network, exposing live hardware to neutron beams. We complement our beam experiments with fault injection to better characterize fault propagation in CNNs. We show that a single fault occurring in a GPU tends to propagate to multiple active threads, significantly reducing the reliability of a CNN. Moreover, relying on error correcting codes dramatically reduces the number of silent data corruptions (SDCs), but does not reduce the number of critical errors (i.e., errors that could potentially impact safety-critical applications). Based on observations on how faults propagate on GPU architectures, we propose effective strategies to improve CNN reliability. We also consider the benefits of using an algorithm-based fault-tolerance technique for matrix multiplication, which can correct more than 87% of the critical SDCs in a CNN, while redesigning maxpool layers of the CNN to detect up to 98% of critical SDCs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2019
			
	Titolo del periodico (Journal title)
	
				IEEE TRANSACTIONS ON RELIABILITY
			
	Numero e parte del fascicolo (Issue number and part)
	
				2
			
	DOI
	
				https://dx.doi.org/10.1109/TR.2018.2878387
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85056590112
			
	Codice WOS (WOS identifier)
	
				WOS:000470826100020
			
	Tutti gli autori
	
						Santos, F. F. D.; Pimenta, P. F.; Lunardi, C.; Draghetti, L.; Carro, L.; Kaeli, D.; Rech, P.
					
	Citazione
	
				Analyzing and Increasing the Reliability of Convolutional Neural Networks on GPUs / Santos, F.F.D., Pimenta, P.F., Lunardi, C., Draghetti, L., Carro, L., Kaeli, D., Rech, P.. - In: IEEE TRANSACTIONS ON RELIABILITY. - ISSN 0018-9529. - 2019, 68:2(2019), pp. 663-677. [10.1109/TR.2018.2878387]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
TR_Analyzing_and_Increasing_the_Reliability_of_Convolutional_Neural_Networks_on_GPUs-2.pdf Solo gestori archivio Descrizione: IEEE Xplore - conference paper Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.21 MB Formato Adobe PDF Visualizza/Apri	3.21 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346709

Citazioni

ND

207

183

221

social impact