Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Crespi, Marco; Custode, Leonardo Lucio; Iacca, Giovanni

doi:10.1007/978-3-031-21094-5_19

Deep Learning (DL) allowed the field of Multi-Agent Reinforcement Learning (MARL) to make significant advances, speeding-up the progress in the field. However, agents trained by means of DL in MARL settings have an important drawback: their policies are extremely hard to interpret, not only at the individual agent level, but also (and especially) considering the fact that one has to take into account the interactions across the whole set of agents. In this work, we make a step towards achieving interpretability in MARL tasks. To do that, we present an approach that combines evolutionary computation (i.e., grammatical evolution) and reinforcement learning (Q-learning), which allows us to produce agents that are, at least to some extent, understandable. Moreover, differently from the typically centralized DL-based approaches (and because of the possibility to use a replay buffer), in our method we can easily employ Independent Q-learning to train a team of agents, which facilitates robustness and scalability. By evaluating our approach on the Battlefield task from the MAgent implementation in the PettingZoo library, we observe that the evolved team of agents is able to coordinate its actions in a distributed fashion, solving the task in an effective way.

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks / Crespi, Marco; Custode, Leonardo Lucio; Iacca, Giovanni. - 13627:(2022), pp. 262-276. (Intervento presentato al convegno 10th International Conference on Bioinspired Optimization Methods and Their Applications, BIOMA 2022 tenutosi a Maribor nel 17th November-18th November 2022) [10.1007/978-3-031-21094-5_19].

Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks

Crespi, Marco;Custode, Leonardo Lucio;Iacca, Giovanni

2022-01-01

Abstract

Deep Learning (DL) allowed the field of Multi-Agent Reinforcement Learning (MARL) to make significant advances, speeding-up the progress in the field. However, agents trained by means of DL in MARL settings have an important drawback: their policies are extremely hard to interpret, not only at the individual agent level, but also (and especially) considering the fact that one has to take into account the interactions across the whole set of agents. In this work, we make a step towards achieving interpretability in MARL tasks. To do that, we present an approach that combines evolutionary computation (i.e., grammatical evolution) and reinforcement learning (Q-learning), which allows us to produce agents that are, at least to some extent, understandable. Moreover, differently from the typically centralized DL-based approaches (and because of the possibility to use a replay buffer), in our method we can easily employ Independent Q-learning to train a team of agents, which facilitates robustness and scalability. By evaluating our approach on the Battlefield task from the MAgent implementation in the PettingZoo library, we observe that the evolved team of agents is able to coordinate its actions in a distributed fashion, solving the task in an effective way.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del volume (Proceedings title)
	
				Bioinspired Optimization Methods and Their Applications (BIOMA) 2022
			
	Luogo di edizione (Place of publication)
	
				GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
			
	Casa editrice (Publisher)
	
				Springer Science and Business Media Deutschland GmbH
			
	ISBN
	
				978-3-031-21093-8
978-3-031-21094-5
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85142760529
			
	Codice WOS (WOS identifier)
	
				WOS:000899264600019
			
	Tutti gli autori
	
						Crespi, Marco; Custode, Leonardo Lucio; Iacca, Giovanni
					
	Citazione
	
				Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks / Crespi, Marco; Custode, Leonardo Lucio; Iacca, Giovanni. - 13627:(2022), pp. 262-276. (Intervento presentato al  convegno 10th International Conference on Bioinspired Optimization Methods and Their Applications, BIOMA 2022 tenutosi a Maribor nel 17th November-18th November 2022) [10.1007/978-3-031-21094-5_19].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Towards Interpretable Policies in Multi-agent Reinforcement Learning Tasks.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 372.22 kB Formato Adobe PDF Visualizza/Apri	372.22 kB	Adobe PDF	Visualizza/Apri
Interpretable_MARL_Pettingzoo.pdf Open Access dal 13/11/2023 Tipologia: Post-print referato (Refereed author’s manuscript) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 328.08 kB Formato Adobe PDF Visualizza/Apri	328.08 kB	Adobe PDF	Visualizza/Apri