A Novel SVM-Based Decoder for Remote Sensing Image Captioning

Hoxha, G.; Melgani, F.

doi:10.1109/TGRS.2021.3105004

Most of the remote sensing image captioning (IC) models are based on encoder–decoder frameworks where a convolutional neural network (CNN) encodes the image information and a recurrent neural network (RNN) decodes the image information into a sentence description. In order to achieve good accuracies, encoder–decoder frameworks relying on RNNs typically require a huge amount of annotated samples. Furthermore, they demand high and expensive computational power in order to have reasonable training and testing time. In this article, we aim to address these issues by introducing a novel decoder that is based on support vector machines (SVMs). In particular, instead of RNNs, we propose a novel network of SVMs to decode the image information into a sentence description. The proposed IC system is particularly interesting when just a limited amount of training samples is available. Experiments conducted on four different IC datasets confirm the promising capability of the proposed IC system to generate descriptions that are highly correlated with the image content. The proposed IC system is characterized by short training and inference times compared to other state-of-the-art models.

A Novel SVM-Based Decoder for Remote Sensing Image Captioning / Hoxha, G.; Melgani, F.. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 0196-2892. - ELETTRONICO. - 60:(2022), pp. 540451401-540451414. [10.1109/TGRS.2021.3105004]

A Novel SVM-Based Decoder for Remote Sensing Image Captioning

Hoxha G.;Melgani F.

2022-01-01

Abstract

Most of the remote sensing image captioning (IC) models are based on encoder–decoder frameworks where a convolutional neural network (CNN) encodes the image information and a recurrent neural network (RNN) decodes the image information into a sentence description. In order to achieve good accuracies, encoder–decoder frameworks relying on RNNs typically require a huge amount of annotated samples. Furthermore, they demand high and expensive computational power in order to have reasonable training and testing time. In this article, we aim to address these issues by introducing a novel decoder that is based on support vector machines (SVMs). In particular, instead of RNNs, we propose a novel network of SVMs to decode the image information into a sentence description. The proposed IC system is particularly interesting when just a limited amount of training samples is available. Experiments conducted on four different IC datasets confirm the promising capability of the proposed IC system to generate descriptions that are highly correlated with the image content. The proposed IC system is characterized by short training and inference times compared to other state-of-the-art models.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
			
	DOI
	
				https://dx.doi.org/10.1109/TGRS.2021.3105004
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85113839685
			
	Codice WOS (WOS identifier)
	
				WOS:000732760800001
			
	Tutti gli autori
	
						Hoxha, G.; Melgani, F.
					
	Citazione
	
				A Novel SVM-Based Decoder for Remote Sensing Image Captioning / Hoxha, G.; Melgani, F.. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 0196-2892. - ELETTRONICO. - 60:(2022), pp. 540451401-540451414. [10.1109/TGRS.2021.3105004]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
2022_TGRS-SVM Captioning.pdf Solo gestori archivio Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 4.56 MB Formato Adobe PDF Visualizza/Apri	4.56 MB	Adobe PDF	Visualizza/Apri