Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices, either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The challenges associated with reliability evaluation grow with the increase of complexity of the hardware and the software. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 and an Arm Cortex-A9 integrated in an SoC and evaluate their reliability in bare-metal and Linux-based configurations. We find that both the SoC integration and the OS presence increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that, even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.

Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements / Bodmann, P.; Papadimitriou, G.; Rech Junior, R. L.; Gizopoulos, D.; Rech, P.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 2021:(2021), pp. 1-1. [10.1109/TC.2021.3128501]

Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements

Rech P.
2021-01-01

Abstract

Extensive research efforts are being carried out to evaluate and improve the reliability of computing devices, either through beam experiments or simulation-based fault injection. Unfortunately, it is still largely unclear to which extend fault injection can provide an accurate error rate estimation at early stages and if beam experiments can be used to identify the weakest resources in a device. The challenges associated with reliability evaluation grow with the increase of complexity of the hardware and the software. In this paper, we combine and analyze data gathered with extensive beam experiments (on the final physical CPU hardware) and microarchitectural fault injections (on early microarchitectural CPU models). We target a standalone Arm Cortex-A5 and an Arm Cortex-A9 integrated in an SoC and evaluate their reliability in bare-metal and Linux-based configurations. We find that both the SoC integration and the OS presence increase the system DUEs (Detected Unrecoverable Errors) rate (for different reasons) but do not significantly impact the SDCs (Silent Data Corruptions) rate which is solely attributed to the CPU core. Our reliability analysis demonstrates that, even considering SoC integration and OS inclusion, early, pre-silicon microarchitecture-level fault injection delivers accurate SDC rates estimations and lower bounds for the DUE rates.
Bodmann, P.; Papadimitriou, G.; Rech Junior, R. L.; Gizopoulos, D.; Rech, P.
Soft Error Effects on Arm Microprocessors: Early Estimations vs. Chip Measurements / Bodmann, P.; Papadimitriou, G.; Rech Junior, R. L.; Gizopoulos, D.; Rech, P.. - In: IEEE TRANSACTIONS ON COMPUTERS. - ISSN 0018-9340. - 2021:(2021), pp. 1-1. [10.1109/TC.2021.3128501]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/346717
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact