Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers’ states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F1-measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features. © 2018 Elsevier Ltd. All rights reserved.

Automatic Classification of Speech Overlaps: Feature Representation and Algorithms / Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.. - In: COMPUTER SPEECH AND LANGUAGE. - ISSN 0885-2308. - 55:(2019), pp. 145-167. [10.1016/j.csl.2018.12.001]

Automatic Classification of Speech Overlaps: Feature Representation and Algorithms

Chowdhury S. A.;Stepanov E. A.;Danieli M.;Riccardi G.
2019-01-01

Abstract

Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers’ states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F1-measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features. © 2018 Elsevier Ltd. All rights reserved.
2019
Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.
Automatic Classification of Speech Overlaps: Feature Representation and Algorithms / Chowdhury, S. A.; Stepanov, E. A.; Danieli, M.; Riccardi, G.. - In: COMPUTER SPEECH AND LANGUAGE. - ISSN 0885-2308. - 55:(2019), pp. 145-167. [10.1016/j.csl.2018.12.001]
File in questo prodotto:
File Dimensione Formato  
CSL19-SpeechOverlapCategorization.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.12 MB
Formato Adobe PDF
3.12 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/250226
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 11
social impact