Visual Question Generation from Remote Sensing Images

IRIS

Visual question generation (VQG) is a fundamental task in vision-language understanding that aims to generate relevant questions about the given input image. In this article, we propose a paragraph-based VQG approach for generating intelligent questions in natural language about remote sensing (RS) images. Specifically, our proposed framework consists of two transformer-based vision and language models. First, we employ a swin-transformer encoder to generate a multiscale representative visual feature from the image. Then, this feature is used as a prefix to guide a generative pretrained transformer-2 (GPT-2) decoder in generating multiple questions in the form of a paragraph to cover the abundant visual information contained in the RS scene. To train the model, the language decoder is fine-tuned on RS dataset to generate a set of relevant questions from the RS image. We evaluate our model on two visual question-answering (VQA) datasets in RS. In addition, we construct a new dataset termed TextRS-VQA for better evaluation for our VQG model. This dataset consists of questions completely annotated by humans which addresses the high redundancy of the questions in prior VQA datasets. Extensive experiments using several accuracy and diversity metrics demonstrate the effectiveness of our proposed VQG model in generating meaningful, valid, and diverse questions from RS images.

Visual Question Generation from Remote Sensing Images / Bashmal, L; Bazi, Y; Melgani, F; Ricci, R; Al Rahhal, M. M; Zuair, M. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - ELETTRONICO. - 16:(2023), pp. 3279-3293. [10.1109/JSTARS.2023.3261361]

Visual Question Generation from Remote Sensing Images

Bashmal, L;Bazi, Y;Melgani, F;Ricci, R;Al Rahhal, M. M;Zuair, M

2023-01-01

Abstract

Visual question generation (VQG) is a fundamental task in vision-language understanding that aims to generate relevant questions about the given input image. In this article, we propose a paragraph-based VQG approach for generating intelligent questions in natural language about remote sensing (RS) images. Specifically, our proposed framework consists of two transformer-based vision and language models. First, we employ a swin-transformer encoder to generate a multiscale representative visual feature from the image. Then, this feature is used as a prefix to guide a generative pretrained transformer-2 (GPT-2) decoder in generating multiple questions in the form of a paragraph to cover the abundant visual information contained in the RS scene. To train the model, the language decoder is fine-tuned on RS dataset to generate a set of relevant questions from the RS image. We evaluate our model on two visual question-answering (VQA) datasets in RS. In addition, we construct a new dataset termed TextRS-VQA for better evaluation for our VQG model. This dataset consists of questions completely annotated by humans which addresses the high redundancy of the questions in prior VQA datasets. Extensive experiments using several accuracy and diversity metrics demonstrate the effectiveness of our proposed VQG model in generating meaningful, valid, and diverse questions from RS images.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2023
			
	Titolo del periodico (Journal title)
	
				IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
			
	DOI
	
				https://dx.doi.org/10.1109/JSTARS.2023.3261361
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85151543971
			
	Codice WOS (WOS identifier)
	
				WOS:000970727300001
			
	Tutti gli autori
	
						Bashmal, L; Bazi, Y; Melgani, F; Ricci, R; Al Rahhal, M. M; Zuair, M
					
	Citazione
	
				Visual Question Generation from Remote Sensing Images / Bashmal, L; Bazi, Y; Melgani, F; Ricci, R; Al Rahhal, M. M; Zuair, M. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - ELETTRONICO. - 16:(2023), pp. 3279-3293. [10.1109/JSTARS.2023.3261361]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/400702

Citazioni

ND

13

11

ND

social impact