The Exploration/Exploitation Trade-off in Reinforcement Learning for Dialogue Management

IRIS

Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy. © 2009 IEEE.

The Exploration/Exploitation Trade-off in Reinforcement Learning for Dialogue Management

Varges, Sebastian;Riccardi, Giuseppe;Quarteroni, Silvia Alessandra;Ivanou, Aliaksei

2009-01-01

Abstract

Conversational systems use deterministic rules that trigger actions such as requests for confirmation or clarification. More recently, Reinforcement Learning and (Partially Observable) Markov Decision Processes have been proposed for this task. In this paper, we investigate action selection strategies for dialogue management, in particular the exploration/exploitation trade-off and its impact on final reward (i.e. the session reward after optimization has ended) and lifetime reward (i.e. the overall reward accumulated over the learner's lifetime). We propose to use interleaved exploitation sessions as a learning methodology to assess the reward obtained from the current policy. The experiments show a statistically significant difference in final reward of exploitation-only sessions between a system that optimizes lifetime reward and one that maximizes the reward of the final policy. © 2009 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2009
			
	Titolo del volume (Proceedings title)
	
				The 2009 IEEE Workshop on Automatic Speech Recognition & Understanding
			
	Luogo di edizione (Place of publication)
	
				Piscataway, NJ
			
	Casa editrice (Publisher)
	
				IEEE
			
	ISBN
	
				9781424454792
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-77949349622
			
	Codice WOS (WOS identifier)
	
				WOS:000291368500089
			
	Tutti gli autori
	
						Varges, Sebastian; Riccardi, Giuseppe; Quarteroni, Silvia Alessandra; Ivanou, Aliaksei
					
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/85329

Citazioni

ND

3

2

ND

social impact