We present a customizable Collective Knowledge workflow to study the execution time vs. accuracy trade-offs for the MobileNets CNN family. We use this workflow to evaluate MobileNets on Arm Cortex CPUs using TensorFlow and Arm Mali GPUs using several versions of the Arm Compute Library. Our optimizations for the Arm Bifrost GPU architecture reduce the execution time by 2-3 times, while lying on a Pareto-optimal frontier. We also highlight the challenge of maintaining the accuracy when deploying CNN models across diverse platforms. We make all the workflow components (models, programs, scripts, etc.) publicly available to encourage further exploration by the community.

We present a customizable Collective Knowledge workflow to study the execution time vs. accuracy trade-offs for the MobileNets CNN family. We use this workflow to evaluate MobileNets on Arm Cortex CPUs using TensorFlow and Arm Mali GPUs using several versions of the Arm Compute Library. Our optimizations for the Arm Bifrost GPU architecture reduce the execution time by 2-3 times, while lying on a Pareto-optimal frontier. We also highlight the challenge of maintaining the accuracy when deploying CNN models across diverse platforms. We make all the workflow components (models, programs, scripts, etc.) publicly available to encourage further exploration by the community.

Multi-objective autotuning of MobileNets across the full software/hardware stack / Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori. - ELETTRONICO. - 6(2018), p. 1. ( 1st ACM ReQuEST Workshop/Tournament on Reproducible Software/Hardware Co-Design of Pareto-Efficient Deep Learning, ReQuEST 2018 Williamsburg, VA, USA March 24th – March 28th 2018) [10.1145/3229762.3229767].

Multi-objective autotuning of MobileNets across the full software/hardware stack

Flavio Vella;
2018-01-01

Abstract

We present a customizable Collective Knowledge workflow to study the execution time vs. accuracy trade-offs for the MobileNets CNN family. We use this workflow to evaluate MobileNets on Arm Cortex CPUs using TensorFlow and Arm Mali GPUs using several versions of the Arm Compute Library. Our optimizations for the Arm Bifrost GPU architecture reduce the execution time by 2-3 times, while lying on a Pareto-optimal frontier. We also highlight the challenge of maintaining the accuracy when deploying CNN models across diverse platforms. We make all the workflow components (models, programs, scripts, etc.) publicly available to encourage further exploration by the community.
2018
ReQuEST '18: Proceedings of the 1st on Reproducible Quality-Efficient Systems Tournament on Co-designing Pareto-efficient Deep Learning
New York, USA
Association for Computing Machinery, Inc
9781450359238
Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori
Multi-objective autotuning of MobileNets across the full software/hardware stack / Lokhmotov, Anton; Chunosov, Nikolay; Vella, Flavio; Fursin, Grigori. - ELETTRONICO. - 6(2018), p. 1. ( 1st ACM ReQuEST Workshop/Tournament on Reproducible Software/Hardware Co-Design of Pareto-Efficient Deep Learning, ReQuEST 2018 Williamsburg, VA, USA March 24th – March 28th 2018) [10.1145/3229762.3229767].
File in questo prodotto:
File Dimensione Formato  
ASPLOS-3229762.3229767.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.49 MB
Formato Adobe PDF
2.49 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/332800
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 3
  • OpenAlex ND
social impact