A COMPARISON OF DEEP LEARNING INFERENCE ENGINES FOR EMBEDDED REAL-TIME AUDIO CLASSIFICATION

IRIS

Recent advancements in deep learning have shown great potential for audio applications, improving the accuracy of previous solutions for tasks such as music transcription, beat detection, and real-time audio processing. In addition, the availability of increasingly powerful embedded computers has led many deep learning framework developers to devise software optimized to run pre-trained models in resource-constrained contexts. As a result, the use of deep learning on embedded devices and audio plugins has become more widespread. However, confusion has been rising around deep learning inference engines, regarding which of these can run in real-time and which are less resource-hungry. In this paper, we present a comparison of four available deep learning inference engines for real-time audio classification on the CPU of an embedded single-board computer: TensorFlow Lite, TorchScript, ONNX Runtime, and RTNeural. Results show that all inference engines can execute neural network models in real-time with appropriate code practices, but execution time varies between engines and models. Most importantly, we found that most of the less-specialized engines offer great flexibility and can be used effectively for real-time audio classification, with slightly better results than a real-time-specific approach. In contrast, more specialized solutions can offer a lightweight and minimalist alternative where less flexibility is needed.

A COMPARISON OF DEEP LEARNING INFERENCE ENGINES FOR EMBEDDED REAL-TIME AUDIO CLASSIFICATION / Stefani, D.; Peroni, S.; Turchet, L.. - 3:(2022), pp. 256-263. (Intervento presentato al convegno 25th International Conference on Digital Audio Effects, DAFx 2022 tenutosi a Vienna nel 6-10 September 2022).

A COMPARISON OF DEEP LEARNING INFERENCE ENGINES FOR EMBEDDED REAL-TIME AUDIO CLASSIFICATION

Stefani D.^Primo;Peroni S.^Secondo;Turchet L.^Ultimo

2022-01-01

Abstract

Recent advancements in deep learning have shown great potential for audio applications, improving the accuracy of previous solutions for tasks such as music transcription, beat detection, and real-time audio processing. In addition, the availability of increasingly powerful embedded computers has led many deep learning framework developers to devise software optimized to run pre-trained models in resource-constrained contexts. As a result, the use of deep learning on embedded devices and audio plugins has become more widespread. However, confusion has been rising around deep learning inference engines, regarding which of these can run in real-time and which are less resource-hungry. In this paper, we present a comparison of four available deep learning inference engines for real-time audio classification on the CPU of an embedded single-board computer: TensorFlow Lite, TorchScript, ONNX Runtime, and RTNeural. Results show that all inference engines can execute neural network models in real-time with appropriate code practices, but execution time varies between engines and models. Most importantly, we found that most of the less-specialized engines offer great flexibility and can be used effectively for real-time audio classification, with slightly better results than a real-time-specific approach. In contrast, more specialized solutions can offer a lightweight and minimalist alternative where less flexibility is needed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the International Conference on Digital Audio Effects, DAFx
			
	Luogo di edizione (Place of publication)
	
				Basel, Switzerland
			
	Casa editrice (Publisher)
	
				DAFx
			
	ISBN
	
				978-3-200-08599-2
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85138793583
			
	Tutti gli autori
	
						Stefani, D.; Peroni, S.; Turchet, L.
					
	Citazione
	
				A COMPARISON OF DEEP LEARNING INFERENCE ENGINES FOR EMBEDDED REAL-TIME AUDIO CLASSIFICATION / Stefani, D.; Peroni, S.; Turchet, L.. - 3:(2022), pp. 256-263. (Intervento presentato al  convegno 25th International Conference on Digital Audio Effects, DAFx 2022 tenutosi a Vienna nel 6-10 September 2022).
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
A_Comparison_of_Deep_Learning_Inference_Engines_for_Embedded_Real-Time_Audio_Classification.pdf accesso aperto Descrizione: articolo Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 916.36 kB Formato Adobe PDF Visualizza/Apri	916.36 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/364734

Citazioni

ND

13

ND

ND

social impact