In this paper, we evaluate the effects of reducing the average memory access time (AMAT) on graphics processing units' (GPU) performance and reliability based on data obtained at Los Alamos Neutron Science Center (LANSCE). We also measure the effects of input size changes on the neutron radiation sensitivity of the GPU running different applications. Results show an increase in the silent data corruption (SDC) cross section with AMAT optimizations from a higher usage of unprotected registers and SRAM memory resources, and an increase in the single event functional interruption (SEFI) cross section of applications that did not saturate the scheduling resources of the GPU. Based on the execution time changes and cross section increases reported, we extend the reliability analysis of parallel processors by proposing the mean workload between failures (MWBF) metric to evaluate the amount of data correctly computed before experiencing a failure. The use of optimizations leads to more stable MWBF values that indicate a better reliability with respect to nonoptimized codes when processing large inputs.

Memory Access Time and Input Size Effects on Parallel Processors Reliability / Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 62:6(2015), pp. 2627-2634. [10.1109/TNS.2015.2496381]

Memory Access Time and Input Size Effects on Parallel Processors Reliability

Rech P.
2015-01-01

Abstract

In this paper, we evaluate the effects of reducing the average memory access time (AMAT) on graphics processing units' (GPU) performance and reliability based on data obtained at Los Alamos Neutron Science Center (LANSCE). We also measure the effects of input size changes on the neutron radiation sensitivity of the GPU running different applications. Results show an increase in the silent data corruption (SDC) cross section with AMAT optimizations from a higher usage of unprotected registers and SRAM memory resources, and an increase in the single event functional interruption (SEFI) cross section of applications that did not saturate the scheduling resources of the GPU. Based on the execution time changes and cross section increases reported, we extend the reliability analysis of parallel processors by proposing the mean workload between failures (MWBF) metric to evaluate the amount of data correctly computed before experiencing a failure. The use of optimizations leads to more stable MWBF values that indicate a better reliability with respect to nonoptimized codes when processing large inputs.
2015
6
Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.
Memory Access Time and Input Size Effects on Parallel Processors Reliability / Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 62:6(2015), pp. 2627-2634. [10.1109/TNS.2015.2496381]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403749
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact