In this paper, we evaluate the effects of reducing the average memory access time (AMAT) on graphics processing units' (GPU) performance and reliability based on data obtained at Los Alamos Neutron Science Center (LANSCE). We also measure the effects of input size changes on the neutron radiation sensitivity of the GPU running different applications. Results show an increase in the silent data corruption (SDC) cross section with AMAT optimizations from a higher usage of unprotected registers and SRAM memory resources, and an increase in the single event functional interruption (SEFI) cross section of applications that did not saturate the scheduling resources of the GPU. Based on the execution time changes and cross section increases reported, we extend the reliability analysis of parallel processors by proposing the mean workload between failures (MWBF) metric to evaluate the amount of data correctly computed before experiencing a failure. The use of optimizations leads to more stable MWBF values that indicate a better reliability with respect to nonoptimized codes when processing large inputs.

Memory Access Time and Input Size Effects on Parallel Processors Reliability / Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 2015, 62:6(2015), pp. 2627-2634. [10.1109/TNS.2015.2496381]

Memory Access Time and Input Size Effects on Parallel Processors Reliability

Rech P.
2015-01-01

Abstract

In this paper, we evaluate the effects of reducing the average memory access time (AMAT) on graphics processing units' (GPU) performance and reliability based on data obtained at Los Alamos Neutron Science Center (LANSCE). We also measure the effects of input size changes on the neutron radiation sensitivity of the GPU running different applications. Results show an increase in the silent data corruption (SDC) cross section with AMAT optimizations from a higher usage of unprotected registers and SRAM memory resources, and an increase in the single event functional interruption (SEFI) cross section of applications that did not saturate the scheduling resources of the GPU. Based on the execution time changes and cross section increases reported, we extend the reliability analysis of parallel processors by proposing the mean workload between failures (MWBF) metric to evaluate the amount of data correctly computed before experiencing a failure. The use of optimizations leads to more stable MWBF values that indicate a better reliability with respect to nonoptimized codes when processing large inputs.
2015
6
Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.
Memory Access Time and Input Size Effects on Parallel Processors Reliability / Pilla, L. L.; Oliveira, D. A. G.; Lunardi, C.; Navaux, P. O. A.; Carro, L.; Rech, P.. - In: IEEE TRANSACTIONS ON NUCLEAR SCIENCE. - ISSN 0018-9499. - 2015, 62:6(2015), pp. 2627-2634. [10.1109/TNS.2015.2496381]
File in questo prodotto:
File Dimensione Formato  
Memory_Access_Time_and_Input_Size_Effects_on_Parallel_Processors_Reliability.pdf

Solo gestori archivio

Descrizione: IEEE Transactions on Nuclear Science - conference paper
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403749
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex 3
social impact