The constant need of higher performances and reduced power consumption has lead vendors to design heterogeneous devices that embed traditional CPU and an accelerator, like a GPU or FPGA. When the CPU and the accelerator are used collaboratively the device computational performances reach their peak. However, the higher amount of resources employed for computation has, potentially, the side effect of increasing soft error rate. In this paper we evaluate the reliability behavior of AMD Kaveri Accelerated Processing Units executing a set of heterogeneous applications. We distribute the workload between the CPU and GPU and evaluate which configuration provides the lowest error rate or allows the computation of the highest amount of data before experiencing a failure. We show that, in most cases, the most reliable workload distribution is the one that delivers the highest performances. As experimentally proven, by choosing the correct workload distribution the device reliability can increase of up to 9x.
Identifying the Most Reliable Collaborative Workload Distribution in Heterogeneous Devices / Davila, G. P.; Oliveira, D.; Navaux, P.; Rech, P.. - (2019), pp. 1325-1330. (Intervento presentato al convegno 22nd Design, Automation and Test in Europe Conference and Exhibition, DATE 2019 tenutosi a Firenze Fiera, ita nel 2019) [10.23919/DATE.2019.8715107].
Identifying the Most Reliable Collaborative Workload Distribution in Heterogeneous Devices
Rech P.
2019-01-01
Abstract
The constant need of higher performances and reduced power consumption has lead vendors to design heterogeneous devices that embed traditional CPU and an accelerator, like a GPU or FPGA. When the CPU and the accelerator are used collaboratively the device computational performances reach their peak. However, the higher amount of resources employed for computation has, potentially, the side effect of increasing soft error rate. In this paper we evaluate the reliability behavior of AMD Kaveri Accelerated Processing Units executing a set of heterogeneous applications. We distribute the workload between the CPU and GPU and evaluate which configuration provides the lowest error rate or allows the computation of the highest amount of data before experiencing a failure. We show that, in most cases, the most reliable workload distribution is the one that delivers the highest performances. As experimentally proven, by choosing the correct workload distribution the device reliability can increase of up to 9x.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione