Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.

Robust Image Captioning with Post-Generation Ensemble Method / Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N. - ELETTRONICO. - (2023), pp. 5234-5237. (Intervento presentato al convegno IEEE-International Geoscience and Remote Sensing Symposium IGARSS-2023 tenutosi a Pasadena, USA nel 16-21, July 2023) [10.1109/IGARSS52108.2023.10281769].

Robust Image Captioning with Post-Generation Ensemble Method

Ricci, R;Melgani, F;
2023-01-01

Abstract

Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.
2023
IEEE-International Geoscience and Remote Sensing Symposium IGARSS-2023
New York, USA
IEEE
979-8-3503-2010-7
979-8-3503-3174-5
Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N
Robust Image Captioning with Post-Generation Ensemble Method / Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N. - ELETTRONICO. - (2023), pp. 5234-5237. (Intervento presentato al convegno IEEE-International Geoscience and Remote Sensing Symposium IGARSS-2023 tenutosi a Pasadena, USA nel 16-21, July 2023) [10.1109/IGARSS52108.2023.10281769].
File in questo prodotto:
File Dimensione Formato  
Pubblicazione 1.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/400704
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact