Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy. © 2009 IEEE.

The Exploration/Exploitation Trade-off in Reinforcement Learning for Dialogue Management

Varges, Sebastian;Riccardi, Giuseppe;Quarteroni, Silvia Alessandra;Ivanou, Aliaksei
2009-01-01

Abstract

Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy. © 2009 IEEE.
2009
The 2009 IEEE Workshop on Automatic Speech Recognition & Understanding
Piscataway, NJ
IEEE
9781424454792
Varges, Sebastian; Riccardi, Giuseppe; Quarteroni, Silvia Alessandra; Ivanou, Aliaksei
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/85329
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact