Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval

IRIS

Cross-modal text-image retrieval in remote sensing (RS) provides a flexible retrieval experience for mining useful information from RS repositories. However, existing methods are designed to accept queries formulated in the English language only, which may restrict accessibility to useful information for non-English speakers. Allowing multilanguage queries can enhance the communication with the retrieval system and broaden access to the RS information. To address this limitation, this article proposes a multilanguage framework based on transformers. Specifically, our framework is composed of two transformer encoders for learning modality-specific representations, the first is a language encoder for generating language representation features from the textual description, while the second is a vision encoder for extracting visual features from the corresponding image. The two encoders are trained jointly on image and text pairs by minimizing a bidirectional contrastive loss. To enable the model to understand queries in multiple languages, we trained it on descriptions from four different languages, namely, English, Arabic, French, and Italian. The experimental results on three benchmark datasets (i.e., RSITMD, RSICD, and UCM) demonstrate that the proposed model improves significantly the retrieval performances in terms of recall compared to the existing state-of-the-art RS retrieval methods.

Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval / Rahhal, M. M. A.; Bazi, Y.; Alsharif, N. A.; Bashmal, L.; Alajlan, N.; Melgani, F.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - ELETTRONICO. - 15:(2022), pp. 9115-9126. [10.1109/JSTARS.2022.3215803]

Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval

Rahhal M. M. A.;Bazi Y.;Alsharif N. A.;Bashmal L.;Alajlan N.;Melgani F.

2022-01-01

Abstract

Cross-modal text-image retrieval in remote sensing (RS) provides a flexible retrieval experience for mining useful information from RS repositories. However, existing methods are designed to accept queries formulated in the English language only, which may restrict accessibility to useful information for non-English speakers. Allowing multilanguage queries can enhance the communication with the retrieval system and broaden access to the RS information. To address this limitation, this article proposes a multilanguage framework based on transformers. Specifically, our framework is composed of two transformer encoders for learning modality-specific representations, the first is a language encoder for generating language representation features from the textual description, while the second is a vision encoder for extracting visual features from the corresponding image. The two encoders are trained jointly on image and text pairs by minimizing a bidirectional contrastive loss. To enable the model to understand queries in multiple languages, we trained it on descriptions from four different languages, namely, English, Arabic, French, and Italian. The experimental results on three benchmark datasets (i.e., RSITMD, RSICD, and UCM) demonstrate that the proposed model improves significantly the retrieval performances in terms of recall compared to the existing state-of-the-art RS retrieval methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
			
	DOI
	
				https://dx.doi.org/10.1109/JSTARS.2022.3215803
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85140782163
			
	Codice WOS (WOS identifier)
	
				WOS:000878197300003
			
	Tutti gli autori
	
						Rahhal, M. M. A.; Bazi, Y.; Alsharif, N. A.; Bashmal, L.; Alajlan, N.; Melgani, F.
					
	Citazione
	
				Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval / Rahhal, M. M. A.; Bazi, Y.; Alsharif, N. A.; Bashmal, L.; Alajlan, N.; Melgani, F.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - ELETTRONICO. - 15:(2022), pp. 9115-9126. [10.1109/JSTARS.2022.3215803]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
2022-JSTARS-Multilanguage_Transformer.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.26 MB Formato Adobe PDF Visualizza/Apri	3.26 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/372928

Citazioni

ND

35

27

ND

social impact