Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

IRIS

Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration such as the additive action noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant during training. In this paper, we focus on action noise in off-policy deep reinforcement learning for continuous control. We analyze how the learned policy is impacted by the noise type, noise scale, and impact scaling factor reduction schedule. We consider the two most prominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the statespace coverage during exploration. For the latter, we propose a novel state-space coverage measure XUrel that is more robust to estimation artifacts caused by points close to the statespace boundary than previously-proposed measures. Larger noise scales generally increase state-space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise type and scale are environment dependent, and based on our observations derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance / Hollenstein, Jakob; Auddy, Sayantan; Saveriano, Matteo; Renaudo, Erwan; Piater, Justus. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - 11:(2022), pp. 1-33.

Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance

Hollenstein, Jakob;Auddy, Sayantan;Saveriano, Matteo;Renaudo, Erwan;Piater, Justus

2022-01-01

Abstract

Many Deep Reinforcement Learning (D-RL) algorithms rely on simple forms of exploration such as the additive action noise often used in continuous control domains. Typically, the scaling factor of this action noise is chosen as a hyper-parameter and is kept constant during training. In this paper, we focus on action noise in off-policy deep reinforcement learning for continuous control. We analyze how the learned policy is impacted by the noise type, noise scale, and impact scaling factor reduction schedule. We consider the two most prominent types of action noise, Gaussian and Ornstein-Uhlenbeck noise, and perform a vast experimental campaign by systematically varying the noise type and scale parameter, and by measuring variables of interest like the expected return of the policy and the statespace coverage during exploration. For the latter, we propose a novel state-space coverage measure XUrel that is more robust to estimation artifacts caused by points close to the statespace boundary than previously-proposed measures. Larger noise scales generally increase state-space coverage. However, we found that increasing the space coverage using a larger noise scale is often not beneficial. On the contrary, reducing the noise scale over the training process reduces the variance and generally improves the learning performance. We conclude that the best noise type and scale are environment dependent, and based on our observations derive heuristic rules for guiding the choice of the action noise as a starting point for further optimization.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				TRANSACTIONS ON MACHINE LEARNING RESEARCH
			
	Tutti gli autori
	
						Hollenstein, Jakob; Auddy, Sayantan; Saveriano, Matteo; Renaudo, Erwan; Piater, Justus
					
	Citazione
	
				Action Noise in Off-Policy Deep Reinforcement Learning: Impact on Exploration and Performance / Hollenstein, Jakob; Auddy, Sayantan; Saveriano, Matteo; Renaudo, Erwan; Piater, Justus. - In: TRANSACTIONS ON MACHINE LEARNING RESEARCH. - ISSN 2835-8856. - 11:(2022), pp. 1-33.
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
354_action_noise_in_off_policy_dee.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.8 MB Formato Adobe PDF Visualizza/Apri	1.8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/363924

Citazioni

ND

ND

ND

ND

social impact