Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices, either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The challenges associated with reliability evaluation grow with the increase of complexity of the hardware and the software. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 and an Arm Cortex-A9 integrated in an SoC and evaluate their reliability in bare-metal and Linux-based configurations. We find that both the SoC integration and the OS presence increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that, even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.

Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements / Bodmann, Pablo R.; Papadimitriou, George; Rech Junior, Rubens L.; Gizopoulos, Dimitris; Rech, Paolo. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 71:10(2022), pp. 2358-2369. [10.1109/TC.2021.3128501]

Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements

Rech, Paolo
Ultimo
2022-01-01

Abstract

Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices, either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The challenges associated with reliability evaluation grow with the increase of complexity of the hardware and the software. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 and an Arm Cortex-A9 integrated in an SoC and evaluate their reliability in bare-metal and Linux-based configurations. We find that both the SoC integration and the OS presence increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that, even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.
2022
10
Bodmann, Pablo R.; Papadimitriou, George; Rech Junior, Rubens L.; Gizopoulos, Dimitris; Rech, Paolo
Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements / Bodmann, Pablo R.; Papadimitriou, George; Rech Junior, Rubens L.; Gizopoulos, Dimitris; Rech, Paolo. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 71:10(2022), pp. 2358-2369. [10.1109/TC.2021.3128501]
File in questo prodotto:
File Dimensione Formato  
Soft_Error_Effects_on_Arm_Microprocessors_Early_Estimations_versus_Chip_Measurements.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.38 MB
Formato Adobe PDF
2.38 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346717
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 21
  • ???jsp.display-item.citation.isi??? 18
social impact