Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.
Robust Image Captioning with Post-Generation Ensemble Method / Ricci, R; Melgani, F; Marcato Junior, J; Goncalves, W. N. - ELETTRONICO. - 2023-:(2023), pp. 5234-5237. (Intervento presentato al convegno 2023 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2023 tenutosi a Pasadena, USA nel 16-21, July 2023) [10.1109/IGARSS52108.2023.10281769].
Robust Image Captioning with Post-Generation Ensemble Method
Ricci, R;Melgani, F;
2023-01-01
Abstract
Remote sensing image captioning is a research domain that aims to automatically generate natural language descriptions of the contents within remote sensed images. Providing accurate depictions of image contents holds great significance for downstream applications such as image retrieval and image understanding. While there is a pressing need for reliable results, current research predominantly focuses on single captioning algorithms, striving to enhance their performance on specific target-oriented datasets. Undoubtedly, this research trajectory is highly important. However, we believe that relying solely on the output of a single captioner may introduce a vulnerability from a robustness standpoint. This concern is particularly relevant in remote sensing, where the scarcity of large-scale datasets can limit the robustness and reliability of resulting algorithms. In this paper, we propose an approach that harnesses the advantages of ensembles to enhance accuracy and reliability in the context of image captioning. Our method introduces a novel technique for utilizing an ensemble of diverse captioning algorithms and automatically selecting the most suitable caption from the set of predictions. By decoupling the description generation and selection phases, this approach enables high flexibility of integration of architecturally different captioning algorithms in the pipeline.File | Dimensione | Formato | |
---|---|---|---|
Pubblicazione 1.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.13 MB
Formato
Adobe PDF
|
1.13 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione