We consider a complex control problem: making a monopod accurately reach a target with a single jump. The monopod can jump in any direction at different elevations of the terrain. This is a paradigm for a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) is an interesting alternative, but an end-to-end approach in which the controller must learn everything from scratch can be nontrivial with a sparse-reward task like jumping. Our solution is to guide the learning process within an RL framework leveraging nature-inspired heuristic knowledge. This expedient brings widespread benefits, such as a drastic reduction of learning time, and the ability to learn and compensate for possible errors in the low-level execution of the motion. Our simulation results reveal a clear advantage of our solution against both optimisation-based and end-to-end RL approaches.
Efficient Reinforcement Learning for 3D Jumping Monopods / Bussola, Riccardo; Focchi, Michele; Del Prete, Andrea; Fontanelli, Daniele; Palopoli, Luigi. - In: SENSORS. - ISSN 1424-8220. - 24:15(2024). [10.3390/s24154981]
Efficient Reinforcement Learning for 3D Jumping Monopods
Focchi, Michele
Co-primo
;Del Prete, AndreaCo-ultimo
;Fontanelli, DanieleCo-ultimo
;Palopoli, LuigiCo-ultimo
2024-01-01
Abstract
We consider a complex control problem: making a monopod accurately reach a target with a single jump. The monopod can jump in any direction at different elevations of the terrain. This is a paradigm for a much larger class of problems, which are extremely challenging and computationally expensive to solve using standard optimisation-based techniques. Reinforcement Learning (RL) is an interesting alternative, but an end-to-end approach in which the controller must learn everything from scratch can be nontrivial with a sparse-reward task like jumping. Our solution is to guide the learning process within an RL framework leveraging nature-inspired heuristic knowledge. This expedient brings widespread benefits, such as a drastic reduction of learning time, and the ability to learn and compensate for possible errors in the low-level execution of the motion. Our simulation results reveal a clear advantage of our solution against both optimisation-based and end-to-end RL approaches.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione