The performance of remote sensing image retrieval (RSIR) systems depends on the capability of the extracted features in characterizing the semantic content of images. Existing RSIR systems describe images by visual descriptors that model the primitives (such as different land-cover classes) present in the images. However, the visual descriptors may not be sufficient to describe the high-level complex content of RS images (e.g., attributes and relationships among different land-cover classes). To address this issue, in this article, we present an RSIR system that aims at generating and exploiting textual descriptions to accurately describe the relationships between the objects and their attributes present in RS images with captions (i.e., sentences). To this end, the proposed retrieval system consists of three main steps. The first step aims to encode the image visual features and then translate the encoded features into a textual description that summarizes the content of the image with captions. This is achieved based on the combination of a convolutional neural network with a recurrent neural network. The second step aims to convert the generated textual descriptions into semantically meaningful feature vectors. This is achieved by using the recent word embedding techniques. Finally, the last step estimates the similarity between the vectors of the textual descriptions of the query image and those of the archive images, and then retrieve the most similar images to the query image. Experimental results obtained on two different datasets show that the description of the image content with captions in the framework of RSIR leads to an accurate retrieval performance.

Toward Remote Sensing Image Retrieval under a Deep Image Captioning Perspective / Hoxha, G.; Melgani, F.; Demir, B.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 2151-1535. - 13:(2020), pp. 4462-4475. [10.1109/JSTARS.2020.3013818]

Toward Remote Sensing Image Retrieval under a Deep Image Captioning Perspective

G. Hoxha;F. Melgani;
2020-01-01

Abstract

The performance of remote sensing image retrieval (RSIR) systems depends on the capability of the extracted features in characterizing the semantic content of images. Existing RSIR systems describe images by visual descriptors that model the primitives (such as different land-cover classes) present in the images. However, the visual descriptors may not be sufficient to describe the high-level complex content of RS images (e.g., attributes and relationships among different land-cover classes). To address this issue, in this article, we present an RSIR system that aims at generating and exploiting textual descriptions to accurately describe the relationships between the objects and their attributes present in RS images with captions (i.e., sentences). To this end, the proposed retrieval system consists of three main steps. The first step aims to encode the image visual features and then translate the encoded features into a textual description that summarizes the content of the image with captions. This is achieved based on the combination of a convolutional neural network with a recurrent neural network. The second step aims to convert the generated textual descriptions into semantically meaningful feature vectors. This is achieved by using the recent word embedding techniques. Finally, the last step estimates the similarity between the vectors of the textual descriptions of the query image and those of the archive images, and then retrieve the most similar images to the query image. Experimental results obtained on two different datasets show that the description of the image content with captions in the framework of RSIR leads to an accurate retrieval performance.
2020
Hoxha, G.; Melgani, F.; Demir, B.
Toward Remote Sensing Image Retrieval under a Deep Image Captioning Perspective / Hoxha, G.; Melgani, F.; Demir, B.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 2151-1535. - 13:(2020), pp. 4462-4475. [10.1109/JSTARS.2020.3013818]
File in questo prodotto:
File Dimensione Formato  
JSTARS-2020-Captioning-Retrieval.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 6.02 MB
Formato Adobe PDF
6.02 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/287547
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 54
  • ???jsp.display-item.citation.isi??? 48
social impact