Training task-oriented dialogue systems requires significant amount of manual effort and integration of many independently built components; moreover, the pipeline is prone to errorpropagation. End-To-end training has been proposed to overcome these problems by training the whole system over the utterances of both dialogue parties. In this paper we present an end-To-end spoken dialogue system architecture that is based on turn embeddings. Turn embeddings encode a robust representation of user turns with a local dialogue history and they are trained using sequence-To-sequence models. Turn embeddings are trained by generating the previous and the next turns of the dialogue and additionally perform spoken language understanding. The end-To-end spoken dialogue system is trained using the pre-Trained turn embeddings in a stateful architecture that considers the whole dialogue history. We observe that the proposed spoken dialogue system architecture outperforms the models based on local-only...
Towards end-To-end spoken dialogue systems with turn embeddings / Bayer, Ali Orkan; Stepanov, Evgeny A.; Riccardi, Giuseppe. - 2017-:(2017), pp. 2516-2520. ( 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 swe 2017) [10.21437/Interspeech.2017-1577].
Towards end-To-end spoken dialogue systems with turn embeddings
Bayer, Ali Orkan;Stepanov, Evgeny A.;Riccardi, Giuseppe
2017-01-01
Abstract
Training task-oriented dialogue systems requires significant amount of manual effort and integration of many independently built components; moreover, the pipeline is prone to errorpropagation. End-To-end training has been proposed to overcome these problems by training the whole system over the utterances of both dialogue parties. In this paper we present an end-To-end spoken dialogue system architecture that is based on turn embeddings. Turn embeddings encode a robust representation of user turns with a local dialogue history and they are trained using sequence-To-sequence models. Turn embeddings are trained by generating the previous and the next turns of the dialogue and additionally perform spoken language understanding. The end-To-end spoken dialogue system is trained using the pre-Trained turn embeddings in a stateful architecture that considers the whole dialogue history. We observe that the proposed spoken dialogue system architecture outperforms the models based on local-only...I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



