CACTO-SL: Using Sobolev Learning to Improve Continuous Actor-Critic with Trajectory Optimization

IRIS

Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locallyoptimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present CACTO-SL, an extension of CACTO exploiting the idea of Sobolev Learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.

CACTO-SL: Using Sobolev Learning to Improve Continuous Actor-Critic with Trajectory Optimization / Alboni, Elisa; Grandesso, Gianluigi; Rosati Papini, Gastone Pietro; Carpentier, Justin; Del Prete, Andrea. - 242:(2024), pp. 1452-1463. ( L4DC 2024 Oxford, UK 15-17 July, 2024) [10.48550/arXiv.2312.10666].

CACTO-SL: Using Sobolev Learning to Improve Continuous Actor-Critic with Trajectory Optimization

Elisa Alboni;Gianluigi Grandesso;Gastone Pietro Rosati Papini;Justin Carpentier;Andrea Del Prete

2024-01-01

Abstract

Trajectory Optimization (TO) and Reinforcement Learning (RL) are powerful and complementary tools to solve optimal control problems. On the one hand, TO can efficiently compute locallyoptimal solutions, but it tends to get stuck in local minima if the problem is not convex. On the other hand, RL is typically less sensitive to non-convexity, but it requires a much higher computational effort. Recently, we have proposed CACTO (Continuous Actor-Critic with Trajectory Optimization), an algorithm that uses TO to guide the exploration of an actor-critic RL algorithm. In turns, the policy encoded by the actor is used to warm-start TO, closing the loop between TO and RL. In this work, we present CACTO-SL, an extension of CACTO exploiting the idea of Sobolev Learning. To make the training of the critic network faster and more data efficient, we enrich it with the gradient of the Value function, computed via a backward pass of the differential dynamic programming algorithm. Our results show that the new algorithm is more efficient than the original CACTO, reducing the number of TO episodes by a factor ranging from 3 to 10, and consequently the computation time. Moreover, we show that CACTO-SL helps TO to find better minima and to produce more consistent results.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 6th Annual Learning for Dynamics & Control Conference
			
	Luogo di edizione (Place of publication)
	
				University of Oxford, UK
			
	Casa editrice (Publisher)
	
				Proceedings of Machine Learning Research (PMLR)
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85203705494
			
	Codice WOS (WOS identifier)
	
				WOS:001347137500111
			
	Tutti gli autori
	
						Alboni, Elisa; Grandesso, Gianluigi; Rosati Papini, Gastone Pietro; Carpentier, Justin; Del Prete, Andrea
					
	Citazione
	
				CACTO-SL: Using Sobolev Learning to Improve Continuous Actor-Critic with Trajectory Optimization / Alboni, Elisa; Grandesso, Gianluigi; Rosati Papini, Gastone Pietro; Carpentier, Justin; Del Prete, Andrea. - 242:(2024), pp. 1452-1463. ( L4DC 2024 Oxford, UK 15-17 July, 2024) [10.48550/arXiv.2312.10666].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
CACTO_Sobolev.pdf accesso aperto Descrizione: Robotica e meccatronica Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 1.32 MB Formato Adobe PDF Visualizza/Apri	1.32 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/418370

Citazioni

ND

2

2

ND

social impact