Graphics processing units (GPUs) are playing a critical role in convolutional neural networks (CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments, reliability is becoming a growing concern. In this paper, we evaluate and propose strategies to improve the reliability of object detection algorithms, as run on three NVIDIA GPU architectures. We consider three algorithms: 1) you only look once; 2) a faster region-based CNN (Faster R-CNN); and 3) a residual network, exposing live hardware to neutron beams. We complement our beam experiments with fault injection to better characterize fault propagation in CNNs. We show that a single fault occurring in a GPU tends to propagate to multiple active threads, significantly reducing the reliability of a CNN. Moreover, relying on error correcting codes dramatically reduces the number of silent data corruptions (SDCs), but does not reduce the number of critical errors (i.e., errors that could potentially impact safety-critical applications). Based on observations on how faults propagate on GPU architectures, we propose effective strategies to improve CNN reliability. We also consider the benefits of using an algorithm-based fault-tolerance technique for matrix multiplication, which can correct more than 87% of the critical SDCs in a CNN, while redesigning maxpool layers of the CNN to detect up to 98% of critical SDCs.
Analyzing and increasing the reliability of convolutional neural networks on GPUs / Santos, F. F. D.; Pimenta, P. F.; Lunardi, C.; Draghetti, L.; Carro, L.; Kaeli, D.; Rech, P.. - In: IEEE TRANSACTIONS ON RELIABILITY. - ISSN 0018-9529. - 68:2(2019), pp. 663-677. [10.1109/TR.2018.2878387]
Scheda prodotto non validato
I dati visualizzati non sono stati ancora sottoposti a validazione formale da parte dello Staff di IRIS, ma sono stati ugualmente trasmessi al Sito Docente Cineca (Loginmiur).
Titolo: | Analyzing and increasing the reliability of convolutional neural networks on GPUs | |
Autori: | Santos, F. F. D.; Pimenta, P. F.; Lunardi, C.; Draghetti, L.; Carro, L.; Kaeli, D.; Rech, P. | |
Autori Unitn: | ||
Titolo del periodico: | IEEE TRANSACTIONS ON RELIABILITY | |
Anno di pubblicazione: | 2019 | |
Numero e parte del fascicolo: | 2 | |
Codice identificativo Scopus: | 2-s2.0-85056590112 | |
Digital Object Identifier (DOI): | http://dx.doi.org/10.1109/TR.2018.2878387 | |
Handle: | http://hdl.handle.net/11572/346709 | |
Citazione: | Analyzing and increasing the reliability of convolutional neural networks on GPUs / Santos, F. F. D.; Pimenta, P. F.; Lunardi, C.; Draghetti, L.; Carro, L.; Kaeli, D.; Rech, P.. - In: IEEE TRANSACTIONS ON RELIABILITY. - ISSN 0018-9529. - 68:2(2019), pp. 663-677. [10.1109/TR.2018.2878387] | |
Appare nelle tipologie: | 03.1 Articolo su rivista (Journal article) |