Since the demand for computing power increases, new architectures arise to obtain better performance. An important class of integrated devices is heterogeneous architectures, which join different specialized hardware into a single chip, composing a System on Chip - SoC. Within this context, effectively splitting tasks between the different architectures is primal to obtain efficiency and performance. In this work, we evaluate two heterogeneous architectures: one composed of a general-purpose CPU and a graphics processing unit (GPU) integrated into a single chip (AMD Kaveri SoC), and another composed by a general-purpose CPU and a Field Programmable Gate Array (FPGA) integrated into a single chip (Intel Arria 10 SoC). We investigate how data partitioning affects the performance of each device in a collaborative execution through the decomposition of the data domain. As a case study, we apply the technique in the well-known Lattice Boltzmann Method (LBM), analyzing the performance of five kernels in both architectures. Our experimental results show that non-uniform partitioning improves LBM kernels performance by up to 11.40% and 15.15% on AMD Kaveri and Intel Arria 10, respectively.

Non-uniform partitioning for collaborative execution on heterogeneous architectures / Freytag, G.; Serpa, M. S.; Lima, J. V. F.; Rech, P.; Navaux, P. O. A.. - 2019-:(2019), pp. 128-135. (Intervento presentato al convegno 31st International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2019 tenutosi a bra nel 2019) [10.1109/SBAC-PAD.2019.00031].

Non-uniform partitioning for collaborative execution on heterogeneous architectures

Rech P.;
2019-01-01

Abstract

Since the demand for computing power increases, new architectures arise to obtain better performance. An important class of integrated devices is heterogeneous architectures, which join different specialized hardware into a single chip, composing a System on Chip - SoC. Within this context, effectively splitting tasks between the different architectures is primal to obtain efficiency and performance. In this work, we evaluate two heterogeneous architectures: one composed of a general-purpose CPU and a graphics processing unit (GPU) integrated into a single chip (AMD Kaveri SoC), and another composed by a general-purpose CPU and a Field Programmable Gate Array (FPGA) integrated into a single chip (Intel Arria 10 SoC). We investigate how data partitioning affects the performance of each device in a collaborative execution through the decomposition of the data domain. As a case study, we apply the technique in the well-known Lattice Boltzmann Method (LBM), analyzing the performance of five kernels in both architectures. Our experimental results show that non-uniform partitioning improves LBM kernels performance by up to 11.40% and 15.15% on AMD Kaveri and Intel Arria 10, respectively.
2019
Proceedings - Symposium on Computer Architecture and High Performance Computing
Stati Uniti
IEEE Computer Society
978-1-7281-4194-7
Freytag, G.; Serpa, M. S.; Lima, J. V. F.; Rech, P.; Navaux, P. O. A.
Non-uniform partitioning for collaborative execution on heterogeneous architectures / Freytag, G.; Serpa, M. S.; Lima, J. V. F.; Rech, P.; Navaux, P. O. A.. - 2019-:(2019), pp. 128-135. (Intervento presentato al convegno 31st International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2019 tenutosi a bra nel 2019) [10.1109/SBAC-PAD.2019.00031].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/403753
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact