Most of the remote sensing image captioning (IC) models are based on encoder–decoder frameworks where a convolutional neural network (CNN) encodes the image information and a recurrent neural network (RNN) decodes the image information into a sentence description. In order to achieve good accuracies, encoder–decoder frameworks relying on RNNs typically require a huge amount of annotated samples. Furthermore, they demand high and expensive computational power in order to have reasonable training and testing time. In this article, we aim to address these issues by introducing a novel decoder that is based on support vector machines (SVMs). In particular, instead of RNNs, we propose a novel network of SVMs to decode the image information into a sentence description. The proposed IC system is particularly interesting when just a limited amount of training samples is available. Experiments conducted on four different IC datasets confirm the promising capability of the proposed IC system to generate descriptions that are highly correlated with the image content. The proposed IC system is characterized by short training and inference times compared to other state-of-the-art models.

A Novel SVM-Based Decoder for Remote Sensing Image Captioning / Hoxha, G.; Melgani, F.. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 0196-2892. - ELETTRONICO. - 60:(2022), pp. 540451401-540451414. [10.1109/TGRS.2021.3105004]

A Novel SVM-Based Decoder for Remote Sensing Image Captioning

Hoxha G.;Melgani F.
2022-01-01

Abstract

Most of the remote sensing image captioning (IC) models are based on encoder–decoder frameworks where a convolutional neural network (CNN) encodes the image information and a recurrent neural network (RNN) decodes the image information into a sentence description. In order to achieve good accuracies, encoder–decoder frameworks relying on RNNs typically require a huge amount of annotated samples. Furthermore, they demand high and expensive computational power in order to have reasonable training and testing time. In this article, we aim to address these issues by introducing a novel decoder that is based on support vector machines (SVMs). In particular, instead of RNNs, we propose a novel network of SVMs to decode the image information into a sentence description. The proposed IC system is particularly interesting when just a limited amount of training samples is available. Experiments conducted on four different IC datasets confirm the promising capability of the proposed IC system to generate descriptions that are highly correlated with the image content. The proposed IC system is characterized by short training and inference times compared to other state-of-the-art models.
2022
Hoxha, G.; Melgani, F.
A Novel SVM-Based Decoder for Remote Sensing Image Captioning / Hoxha, G.; Melgani, F.. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 0196-2892. - ELETTRONICO. - 60:(2022), pp. 540451401-540451414. [10.1109/TGRS.2021.3105004]
File in questo prodotto:
File Dimensione Formato  
2022_TGRS-SVM Captioning.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 4.56 MB
Formato Adobe PDF
4.56 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/373009
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 41
  • ???jsp.display-item.citation.isi??? 32
social impact