Robust Image Captioning with Post-Generation Ensemble Method

IRIS

Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.

Robust Image Captioning with Post-Generation Ensemble Method / Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N. - ELETTRONICO. - 2023-:(2023), pp. 5234-5237. (Intervento presentato al convegno 2023 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2023 tenutosi a Pasadena, USA nel 16-21, July 2023) [10.1109/IGARSS52108.2023.10281769].

Robust Image Captioning with Post-Generation Ensemble Method

Ricci, R;Melgani, F;Marcato Junior, J;Goncalves, W. N

2023-01-01

Abstract

Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del volume (Proceedings title)
	
				IEEE-International Geoscience and Remote Sensing Symposium IGARSS-2023
			
	Luogo di edizione (Place of publication)
	
				New York, USA
			
	Casa editrice (Publisher)
	
				Institute of Electrical and Electronics Engineers Inc.
			
	ISBN
	
				979-8-3503-2010-7
979-8-3503-3174-5
			
	Codice Scopus (Scopus Identifier)
	
				2-s2.0-85178354465
			
	Codice WOS (WOS identifier)
	
				WOS:001098971605104
			
	Tutti gli autori
	
						Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N
					
	Citazione
	
				Robust Image Captioning with Post-Generation Ensemble Method / Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N. - ELETTRONICO. - 2023-:(2023), pp. 5234-5237. (Intervento presentato al  convegno 2023 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2023 tenutosi a Pasadena, USA nel 16-21, July 2023) [10.1109/IGARSS52108.2023.10281769].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Pubblicazione 1.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.13 MB Formato Adobe PDF Visualizza/Apri	1.13 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/400704

Citazioni

ND

0

0

ND

social impact