Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks

IRIS

The domain of Speech Emotion Recognition (SER) has experienced a tremendous revolution due to the outbreak of deep learning, which has contributed, as in many other research areas, to a significant boost in terms of model accuracy. SER refers to a branch of Human-Computer Interaction (HCI), which deals with recognizing emotional states from human speech. Although being a thriving field of research, SER still poses several non-trivial challenges, mainly due to the lack of shared best practices and high-quality datasets that can make the developed models suitable for their application in real environments. In this paper, we implement a CNN-based model combined with a Convolutional Attention Block, and conduct a series of experiments involving a selection of four English datasets popularly used for SER applications: RAVDESS, TESS, CREMA-D, and IEMOCAP. After testing the proposed pipeline on individual datasets, achieving a mean accuracy of 83%, 100%, 68% and 63% respectively, we perform an extensive cross-validation between common emotional classes belonging to single datasets or combinations of them, with the aim to investigate the generalization abilities of the extracted features.

Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks / Dal Rì, Francesco; Ciardi, Fabio Cifariello; Conci, Nicola. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023), pp. 116638-116649. [10.1109/ACCESS.2023.3326071]

Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks

Dal Rì, Francesco;Ciardi, Fabio Cifariello;Conci, Nicola

2023-01-01

Abstract

The domain of Speech Emotion Recognition (SER) has experienced a tremendous revolution due to the outbreak of deep learning, which has contributed, as in many other research areas, to a significant boost in terms of model accuracy. SER refers to a branch of Human-Computer Interaction (HCI), which deals with recognizing emotional states from human speech. Although being a thriving field of research, SER still poses several non-trivial challenges, mainly due to the lack of shared best practices and high-quality datasets that can make the developed models suitable for their application in real environments. In this paper, we implement a CNN-based model combined with a Convolutional Attention Block, and conduct a series of experiments involving a selection of four English datasets popularly used for SER applications: RAVDESS, TESS, CREMA-D, and IEMOCAP. After testing the proposed pipeline on individual datasets, achieving a mean accuracy of 83%, 100%, 68% and 63% respectively, we perform an extensive cross-validation between common emotional classes belonging to single datasets or combinations of them, with the aim to investigate the generalization abilities of the extracted features.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del periodico (Journal title)
	
				IEEE ACCESS
			
	DOI
	
				https://dx.doi.org/10.1109/ACCESS.2023.3326071
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85174849797
			
	Codice WOS (WOS identifier)
	
				WOS:001095970700001
			
	Tutti gli autori
	
						Dal Rì, Francesco; Ciardi, Fabio Cifariello; Conci, Nicola
					
	Citazione
	
				Speech Emotion Recognition and Deep Learning: An Extensive Validation Using Convolutional Neural Networks / Dal Rì, Francesco; Ciardi, Fabio Cifariello; Conci, Nicola. - In: IEEE ACCESS. - ISSN 2169-3536. - 11:(2023), pp. 116638-116649. [10.1109/ACCESS.2023.3326071]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
Speech_Emotion_Recognition_and_Deep_Learning_An_Extensive_Validation_Using_Convolutional_Neural_Networks.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.26 MB Formato Adobe PDF Visualizza/Apri	3.26 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/406850

Citazioni

ND

10

4

ND

social impact