With the development of high-resolution satellites, more and more attention has been paid to remote sensing (RS) scene classification. Convolutional neural networks (CNNs), which replace the traditional handcrafted features with a learning-based feature extraction mechanism, are widely used in scene classification. But, CNNs are less effective in deriving long-range contextual relations, which limits further improvement. Visual transformer (VT), an emerging image processing method, provides a new perspective for RS scene classification by directly acquiring long-range features. Although there have been limited works combining CNN and VT through simple concatenation, the collaborations between them are insufficient. To address these issues, we propose a local and long-range collaborative framework (L2RCF). First, we design a dual-stream structure to extract the local and long-range features. Second, a cross-feature calibration (CFC) module is designed for them to improve the representation of the fusion features. Then, combining deep supervision (DS) and deep mutual learning (DML), a novel joint loss is proposed to enhance the dual-stream feature extractor and further improve the fused features. Finally, a two-stage semi-supervised training strategy is designed to improve performance with unlabeled samples. To demonstrate the effectiveness of L2RCF, we conducted experiments on three widely used RS scene classification datasets: RSSCN7, AID, and NWPU. The results show that the L2RCF performs significantly better compared with some state-of-the-art scene classification methods.
Local and Long-Range Collaborative Learning for Remote Sensing Scene Classification / Zhao, Maofan; Meng, Qingyan; Zhang, Linlin; Hu, Xinli; Bruzzone, Lorenzo. - In: IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING. - ISSN 0196-2892. - 61:(2023), pp. 1-15. [10.1109/TGRS.2023.3265346]
Local and Long-Range Collaborative Learning for Remote Sensing Scene Classification
Bruzzone, Lorenzo
2023-01-01
Abstract
With the development of high-resolution satellites, more and more attention has been paid to remote sensing (RS) scene classification. Convolutional neural networks (CNNs), which replace the traditional handcrafted features with a learning-based feature extraction mechanism, are widely used in scene classification. But, CNNs are less effective in deriving long-range contextual relations, which limits further improvement. Visual transformer (VT), an emerging image processing method, provides a new perspective for RS scene classification by directly acquiring long-range features. Although there have been limited works combining CNN and VT through simple concatenation, the collaborations between them are insufficient. To address these issues, we propose a local and long-range collaborative framework (L2RCF). First, we design a dual-stream structure to extract the local and long-range features. Second, a cross-feature calibration (CFC) module is designed for them to improve the representation of the fusion features. Then, combining deep supervision (DS) and deep mutual learning (DML), a novel joint loss is proposed to enhance the dual-stream feature extractor and further improve the fused features. Finally, a two-stage semi-supervised training strategy is designed to improve performance with unlabeled samples. To demonstrate the effectiveness of L2RCF, we conducted experiments on three widely used RS scene classification datasets: RSSCN7, AID, and NWPU. The results show that the L2RCF performs significantly better compared with some state-of-the-art scene classification methods.File | Dimensione | Formato | |
---|---|---|---|
TGRS3265346.pdf
accesso aperto
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
9.69 MB
Formato
Adobe PDF
|
9.69 MB | Adobe PDF | Visualizza/Apri |
Local_and_Long-Range_Collaborative_Learning_for_Remote_Sensing_Scene_Classification.pdf
Solo gestori archivio
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
3.76 MB
Formato
Adobe PDF
|
3.76 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione