Compared with remote sensing image (RSI) captioning methods based on the traditional encoder-decoder model, two-stage RSI captioning methods include an auxiliary remote sensing task to provide prior information, which enables them to generate more accurate descriptions. In previous two-stage RSI captioning methods, however, the image captioning and the auxiliary remote sensing tasks are handled separately, which is time-consuming and ignores mutual interference between tasks. To solve this problem, we propose a novel joint-training two-stage (JTTS) RSI captioning method. We use multilabel classification to provide prior information, and we design a differentiable sampling operator to replace the traditional nondifferentiable sampling operation to index the multilabel classification result. In contrast to previous two-stage RSI captioning methods, our method can implement joint training, and the joint loss allows the error of the generated description to flow into the optimization of th...
A Joint-Training Two-Stage Method For Remote Sensing Image Captioning / Ye, Xiutiao; Wang, Shuang; Gu, Yu; Wang, Jihui; Wang, Ruixuan; Hou, Biao; Giunchiglia, Fausto; Jiao, Licheng. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 1558-0644. - 60:(2022). [10.1109/TGRS.2022.3224244]
A Joint-Training Two-Stage Method For Remote Sensing Image Captioning
Yu Gu;Fausto Giunchiglia;
2022-01-01
Abstract
Compared with remote sensing image (RSI) captioning methods based on the traditional encoder-decoder model, two-stage RSI captioning methods include an auxiliary remote sensing task to provide prior information, which enables them to generate more accurate descriptions. In previous two-stage RSI captioning methods, however, the image captioning and the auxiliary remote sensing tasks are handled separately, which is time-consuming and ignores mutual interference between tasks. To solve this problem, we propose a novel joint-training two-stage (JTTS) RSI captioning method. We use multilabel classification to provide prior information, and we design a differentiable sampling operator to replace the traditional nondifferentiable sampling operation to index the multilabel classification result. In contrast to previous two-stage RSI captioning methods, our method can implement joint training, and the joint loss allows the error of the generated description to flow into the optimization of th...| File | Dimensione | Formato | |
|---|---|---|---|
|
2022 12 IEEE Trans. RGS.pdf
accesso aperto
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.25 MB
Formato
Adobe PDF
|
1.25 MB | Adobe PDF | Visualizza/Apri |
|
A_Joint-Training_Two-Stage_Method_For_Remote_Sensing_Image_Captioning.pdf
accesso aperto
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
5.79 MB
Formato
Adobe PDF
|
5.79 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



