Reinforcement Learning and Trajectory Optimization for the Concurrent Design of high-performance robotic systems

Grandesso, Gianluigi

doi:10.15168/11572_381949

As progress pushes the boundaries of both the performance of new hardware components and the computational capacity of modern computers, the requirements on the performance of robotic systems are becoming more and more demanding. The objective of this thesis is to demonstrate that concurrent design (Co-Design) is the approach to follow to design hardware and control for such high-performance robots. In particular, this work proposes a co-design framework and an algorithm to tackle two main issues: i) how to use Co-Design to benchmark different robotic systems, and ii) how to effectively warm-start the trajectory optimization (TO) problem underlying the co-design problem aiming at global optimality. The first contribution of this thesis is a co-design framework for the energy efficiency analysis of a redundant actuation architecture combining Quasi-Direct Drive (QDD) motors and Series Elastic Actuators (SEAs). The energy consumption of the redundant actuation system is compared to that of Geared Motors (GMs) and SEAs alone. This comparison is made considering two robotic systems performing different tasks. The results show that, using the redundant actuation, one can save up to 99% of energy with respect to SEA for sinusoidal movements. This efficiency is achieved by exploiting the coupled dynamics of the two actuators, resulting in a latching-like control strategy. The analysis also shows that these large energy savings are not straightforwardly extendable to non-sinusoidal movements, but smaller savings (e.g., 7%) are nonetheless possible. The results highlight that the combination of complex hardware morphologies and advanced numerical Co-Design can lead to peak hardware performance that would be unattainable by human intuition alone. Moreover, it is also shown how to leverage Stochastic Programming (SP) to extend a similar co-design framework to design robots that are robust to disturbances by combining TO, morphology and feedback control optimization. The second contribution is a first step towards addressing the non-convexity of complex co-design optimization problems. To this aim, an algorithm for the optimal control of dynamical systems is designed that combines TO and Reinforcement Learning (RL) in a single framework. This algorithm tackles the two main limitations of TO and RL when applied to continuous-space non-linear systems to minimize a non-convex cost function: TO can get stuck in poor local minima when the search is not initialized close to a “good” minimum, whereas the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, the proposed algorithm learns a “good” control policy via TO-guided RL policy search. Using this policy to compute an initial guess for TO, makes the trajectory optimization process less prone to converge to poor local optima. The method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems. The results show the great capabilities of the algorithm in escaping local minima, while being more computationally efficient than the state-of-the-art RL algorithms Deep Deterministic Policy Gradient and Proximal Policy Optimization. The current algorithm deals only with the control side of a co-design problem, but future work will extend it to include also hardware optimization. All things considered, this work advanced the state of the art on Co-Design, providing a framework and an algorithm to design both hardware and control for high-performance robots and aiming to the global optimality.

Reinforcement Learning and Trajectory Optimization for the Concurrent Design of high-performance robotic systems / Grandesso, Gianluigi. - (2023 Jul 05), pp. 1-122. [10.15168/11572_381949]

Reinforcement Learning and Trajectory Optimization for the Concurrent Design of high-performance robotic systems

Grandesso, Gianluigi

2023-07-05

Abstract

As progress pushes the boundaries of both the performance of new hardware components and the computational capacity of modern computers, the requirements on the performance of robotic systems are becoming more and more demanding. The objective of this thesis is to demonstrate that concurrent design (Co-Design) is the approach to follow to design hardware and control for such high-performance robots. In particular, this work proposes a co-design framework and an algorithm to tackle two main issues: i) how to use Co-Design to benchmark different robotic systems, and ii) how to effectively warm-start the trajectory optimization (TO) problem underlying the co-design problem aiming at global optimality. The first contribution of this thesis is a co-design framework for the energy efficiency analysis of a redundant actuation architecture combining Quasi-Direct Drive (QDD) motors and Series Elastic Actuators (SEAs). The energy consumption of the redundant actuation system is compared to that of Geared Motors (GMs) and SEAs alone. This comparison is made considering two robotic systems performing different tasks. The results show that, using the redundant actuation, one can save up to 99% of energy with respect to SEA for sinusoidal movements. This efficiency is achieved by exploiting the coupled dynamics of the two actuators, resulting in a latching-like control strategy. The analysis also shows that these large energy savings are not straightforwardly extendable to non-sinusoidal movements, but smaller savings (e.g., 7%) are nonetheless possible. The results highlight that the combination of complex hardware morphologies and advanced numerical Co-Design can lead to peak hardware performance that would be unattainable by human intuition alone. Moreover, it is also shown how to leverage Stochastic Programming (SP) to extend a similar co-design framework to design robots that are robust to disturbances by combining TO, morphology and feedback control optimization. The second contribution is a first step towards addressing the non-convexity of complex co-design optimization problems. To this aim, an algorithm for the optimal control of dynamical systems is designed that combines TO and Reinforcement Learning (RL) in a single framework. This algorithm tackles the two main limitations of TO and RL when applied to continuous-space non-linear systems to minimize a non-convex cost function: TO can get stuck in poor local minima when the search is not initialized close to a “good” minimum, whereas the RL training process may be excessively long and strongly dependent on the exploration strategy. Thus, the proposed algorithm learns a “good” control policy via TO-guided RL policy search. Using this policy to compute an initial guess for TO, makes the trajectory optimization process less prone to converge to poor local optima. The method is validated on several reaching problems featuring non-convex obstacle avoidance with different dynamical systems. The results show the great capabilities of the algorithm in escaping local minima, while being more computationally efficient than the state-of-the-art RL algorithms Deep Deterministic Policy Gradient and Proximal Policy Optimization. The current algorithm deals only with the control side of a co-design problem, but future work will extend it to include also hardware optimization. All things considered, this work advanced the state of the art on Co-Design, providing a framework and an algorithm to design both hardware and control for high-performance robots and aiming to the global optimality.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				5-lug-2023
			
	Ciclo
	
				XXXV
			
	Anno Accademico
	
				2022-2023
			
	Dipartimento
	
				Ingegneria industriale (29/10/12-)
			
	Corso di dottorato
	
				Materials, Mechatronics and Systems Engineering
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Del Prete, Andrea
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Wensing, Patrick M.
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_381949
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
phd_unitn_Gianluigi_Grandesso.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 7.52 MB Formato Adobe PDF Visualizza/Apri	7.52 MB	Adobe PDF	Visualizza/Apri