Automatic Classification of Speech Overlaps: Feature Representation and Algorithms

IRIS

Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers’ states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F1-measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features. © 2018 Elsevier Ltd. All rights reserved.

Automatic Classification of Speech Overlaps: Feature Representation and Algorithms / Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.. - In: COMPUTER SPEECH AND LANGUAGE. - ISSN 0885-2308. - 55:(2019), pp. 145-167. [10.1016/j.csl.2018.12.001]

Automatic Classification of Speech Overlaps: Feature Representation and Algorithms

Chowdhury S. A.;Stepanov E. A.;Danieli M.;Riccardi G.

2019-01-01

Abstract

Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers’ states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F1-measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features. © 2018 Elsevier Ltd. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
			2019
		
	Titolo del periodico (Journal title)
	
			COMPUTER SPEECH AND LANGUAGE
		
	DOI
	
			https://dx.doi.org/10.1016/j.csl.2018.12.001
		
	Codice Scopus (Scopus identifier)
	
			2-s2.0-85059132607
		
	Codice WOS (WOS identifier)
	
			WOS:000456592100008
		
	Tutti gli autori
	
			Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.
		
	Citazione
	
			Automatic Classification of Speech Overlaps: Feature Representation and Algorithms / Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.. - In: COMPUTER SPEECH AND LANGUAGE. - ISSN 0885-2308. - 55:(2019), pp. 145-167. [10.1016/j.csl.2018.12.001]
		
	Appare nelle tipologie:
	
			03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
CSL19-SpeechOverlapCategorization.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.12 MB Formato Adobe PDF Visualizza/Apri	3.12 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/250226

Citazioni

ND

13

11

social impact