Since the demand for computing power increases, new architectures arise to obtain better performance. An important class of integrated devices is heterogeneous architectures, which join different specialized hardware into a single chip, composing a System on Chip - SoC. Within this context, effectively splitting tasks between the different architectures is primal to obtain efficiency and performance. In this work, we evaluate two heterogeneous architectures: one composed of a general-purpose CPU and a graphics processing unit (GPU) integrated into a single chip (AMD Kaveri SoC), and another composed by a general-purpose CPU and a Field Programmable Gate Array (FPGA) integrated into a single chip (Intel Arria 10 SoC). We investigate how data partitioning affects the performance of each device in a collaborative execution through the decomposition of the data domain. As a case study, we apply the technique in the well-known Lattice Boltzmann Method (LBM), analyzing the performance of five kernels in both architectures. Our experimental results show that non-uniform partitioning improves LBM kernels performance by up to 11.40% and 15.15% on AMD Kaveri and Intel Arria 10, respectively.
Non-uniform partitioning for collaborative execution on heterogeneous architectures / Freytag, G.; Serpa, M. S.; Lima, J. V. F.; Rech, P.; Navaux, P. O. A.. - 2019-:(2019), pp. 128-135. (Intervento presentato al convegno 31st International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2019 tenutosi a bra nel 2019) [10.1109/SBAC-PAD.2019.00031].
Non-uniform partitioning for collaborative execution on heterogeneous architectures
Rech P.;
2019-01-01
Abstract
Since the demand for computing power increases, new architectures arise to obtain better performance. An important class of integrated devices is heterogeneous architectures, which join different specialized hardware into a single chip, composing a System on Chip - SoC. Within this context, effectively splitting tasks between the different architectures is primal to obtain efficiency and performance. In this work, we evaluate two heterogeneous architectures: one composed of a general-purpose CPU and a graphics processing unit (GPU) integrated into a single chip (AMD Kaveri SoC), and another composed by a general-purpose CPU and a Field Programmable Gate Array (FPGA) integrated into a single chip (Intel Arria 10 SoC). We investigate how data partitioning affects the performance of each device in a collaborative execution through the decomposition of the data domain. As a case study, we apply the technique in the well-known Lattice Boltzmann Method (LBM), analyzing the performance of five kernels in both architectures. Our experimental results show that non-uniform partitioning improves LBM kernels performance by up to 11.40% and 15.15% on AMD Kaveri and Intel Arria 10, respectively.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione