Novel computng archiectures offer the possibiy execute float point operatons wh different precisions. The executon of reduced precision operatonsis likely to educe both the executon time and power consumpton. However, the applcaton's error rate and the device's reabity can also be impacted by these precision changes This paper, we study the impact of data and operaton precision changes on he reliabiy of modern archiectures. We consider Xnx Feld-Programmable Gae-Arrays (FPGA), Intel Xeon Phis, and NVIDIA Graphics Pocessing Unis (GPUs) executng a set of codes implementd in double, single, and halfprecision EEE754-complant float point data. On FPGAs, the educed area and performance improvements brought by educed precision operatons increase eliabiy. On Xeon Phis the compier biases significantly double- A nd single-precision nstructons executon. This raises the drawback of increasing single-precision error rates when compared to double-precision. NVIDIA GPUs make use of dedicated mixed-precision cores, which draw nontriv-al effects on the device reliabiy. In general, on GPUs halfprecision alows a higher number of executons to be correctly completed before experimentng a faiure. Finaly, we also evaluate how ansient faults impact he output correctness. Our study shows that for most applcatons faults in a single or halfprecision data or operaton are more likely to significantly modify the output value than errors in double-precision data.
Reliability evaluation of mixed-precision architectures / Fernandes Dos Santos, F.; Lunardi, C.; Oliveira, D.; Libano, F.; Rech, P.. - (2019), pp. 238-249. (Intervento presentato al convegno 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 tenutosi a usa nel 2019) [10.1109/HPCA.2019.00041].
Reliability evaluation of mixed-precision architectures
Rech P.
2019-01-01
Abstract
Novel computng archiectures offer the possibiy execute float point operatons wh different precisions. The executon of reduced precision operatonsis likely to educe both the executon time and power consumpton. However, the applcaton's error rate and the device's reabity can also be impacted by these precision changes This paper, we study the impact of data and operaton precision changes on he reliabiy of modern archiectures. We consider Xnx Feld-Programmable Gae-Arrays (FPGA), Intel Xeon Phis, and NVIDIA Graphics Pocessing Unis (GPUs) executng a set of codes implementd in double, single, and halfprecision EEE754-complant float point data. On FPGAs, the educed area and performance improvements brought by educed precision operatons increase eliabiy. On Xeon Phis the compier biases significantly double- A nd single-precision nstructons executon. This raises the drawback of increasing single-precision error rates when compared to double-precision. NVIDIA GPUs make use of dedicated mixed-precision cores, which draw nontriv-al effects on the device reliabiy. In general, on GPUs halfprecision alows a higher number of executons to be correctly completed before experimentng a faiure. Finaly, we also evaluate how ansient faults impact he output correctness. Our study shows that for most applcatons faults in a single or halfprecision data or operaton are more likely to significantly modify the output value than errors in double-precision data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione