The motivation behind the research on overlapping speech has always been dominated by the need to model human-machine interaction for dialog systems and conversation analysis. To have more complex insights of the interlocutors' intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocutor's intention to grab the floor. This act could be a competitive or non-competitive act, which either signals a problem or indicates assistance in communication. In this paper, we present a Deep Learning approach to modeling competitiveness in overlapping speech using acoustic and lexical features and their combination. We compare a fully-connected feed-forward neural network to the Support Vector Machine (SVM) models on real call center human-human conversations. We have observed that feature combination with DNN (significantly) outperforms SVM models, both the individual feature baselines and the feature combination model by 4% and 2% respec...
A Deep Learning approach to modeling competitiveness in spoken conversations / Chowdhury, Shammur Absar; Riccardi, Giuseppe. - (2017), pp. 5680-5684. ( 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2017 Hilton New Orleans Riverside, usa 2017) [10.1109/ICASSP.2017.7953244].
A Deep Learning approach to modeling competitiveness in spoken conversations
Chowdhury, Shammur Absar;Riccardi, Giuseppe
2017-01-01
Abstract
The motivation behind the research on overlapping speech has always been dominated by the need to model human-machine interaction for dialog systems and conversation analysis. To have more complex insights of the interlocutors' intentions behind the interaction, we need to understand the type of overlaps. Overlapping speech signals the interlocutor's intention to grab the floor. This act could be a competitive or non-competitive act, which either signals a problem or indicates assistance in communication. In this paper, we present a Deep Learning approach to modeling competitiveness in overlapping speech using acoustic and lexical features and their combination. We compare a fully-connected feed-forward neural network to the Support Vector Machine (SVM) models on real call center human-human conversations. We have observed that feature combination with DNN (significantly) outperforms SVM models, both the individual feature baselines and the feature combination model by 4% and 2% respec...I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



