The data available in the world come in various modalities, such as audio, text, image, and video. Each data modality has different statistical properties. Understanding each modality, individually, and the relationship between the modalities is vital for a better understanding of the environment surrounding us. Multimodal learning models allow us to process and extract useful information from multimodal sources. For instance, image captioning and text-to-image synthesis are examples of multimodal learning, which require mapping between texts and images. In this paper, we introduce a research area that has never been explored by the remote sensing community, namely the synthesis of remote sensing images from text descriptions. More specifically, in this paper, we focus on exploiting ancient text descriptions of geographical areas, inherited from previous civilizations, to generate equivalent remote sensing images. From a methodological perspective, we propose to rely on generative adversarial networks (GANs) to convert the text descriptions into equivalent pixel values. GANs are a recently proposed class of generative models that formulate learning the distribution of a given dataset as an adversarial competition between two networks. The learned distribution is represented using the weights of a deep neural network and can be used to generate more samples. To fulfill the purpose of this paper, we collected satellite images and ancient texts to train the network. We present the interesting results obtained and propose various future research paths that we believe are important to further develop this new research area.

Retro-Remote Sensing: Generating Images from Ancient Texts / Bejiga, M. B.; Melgani, F.; Vascotto, A.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - 12:3(2019), pp. 950-960. [10.1109/JSTARS.2019.2895693]

Retro-Remote Sensing: Generating Images from Ancient Texts

Bejiga M. B.;Melgani F.;
2019

Abstract

The data available in the world come in various modalities, such as audio, text, image, and video. Each data modality has different statistical properties. Understanding each modality, individually, and the relationship between the modalities is vital for a better understanding of the environment surrounding us. Multimodal learning models allow us to process and extract useful information from multimodal sources. For instance, image captioning and text-to-image synthesis are examples of multimodal learning, which require mapping between texts and images. In this paper, we introduce a research area that has never been explored by the remote sensing community, namely the synthesis of remote sensing images from text descriptions. More specifically, in this paper, we focus on exploiting ancient text descriptions of geographical areas, inherited from previous civilizations, to generate equivalent remote sensing images. From a methodological perspective, we propose to rely on generative adversarial networks (GANs) to convert the text descriptions into equivalent pixel values. GANs are a recently proposed class of generative models that formulate learning the distribution of a given dataset as an adversarial competition between two networks. The learned distribution is represented using the weights of a deep neural network and can be used to generate more samples. To fulfill the purpose of this paper, we collected satellite images and ancient texts to train the network. We present the interesting results obtained and propose various future research paths that we believe are important to further develop this new research area.
3
Bejiga, M. B.; Melgani, F.; Vascotto, A.
Retro-Remote Sensing: Generating Images from Ancient Texts / Bejiga, M. B.; Melgani, F.; Vascotto, A.. - In: IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING. - ISSN 1939-1404. - 12:3(2019), pp. 950-960. [10.1109/JSTARS.2019.2895693]
File in questo prodotto:
File Dimensione Formato  
JSTARS-2019-Retro-Remote Sensing.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.98 MB
Formato Adobe PDF
3.98 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11572/250857
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 8
social impact