Semantic Change Detection (SCD), traditionally referred to as "detection of land cover transition" or "multi-class change detection," has shown remarkable potential with the application of Deep Learning (DL) methods. Unlike standard change detection (CD) tasks, SCD requires the extraction of richer information, including not only the detection of changes in a certain region but also the categorization. This detailed information is crucial for many land-cover and land-use (LCLU) applications. However, existing SCD methods still face challenges such as a significant number of false alarms and omissions, the presence of insufficient annotated datasets, and low overall accuracy. Recent studies have made progress in addressing these challenges by leveraging advanced DL architectures and techniques. For example, the hierarchical semantic graph interaction network (HGINet) uses graph learning to model interactions among different feature layers, improving detection capabilities in complex SCD scenarios. Another approach, Semantic-CD, incorporates open-vocabulary semantics from Vision Foundation Models (VFMs) like CLIP to enhance generalization across semantic categories. These innovations demonstrate the potential of DL to significantly improve SCD performance. In this thesis, we address the above-mentioned challenges by proposing three novel methods that leverage joint spatio-temporal modeling, recurrent semantic change detection, and state space modeling of multi-temporal semantics. Firstly, we propose SCanFormer to explicitly model the "from-to" semantic transitions between bi-temporal remote sensing images (RSIs). This method enhances the context representations of extracted features by leveraging spatio-temporal constraints, leading to more accurate semantic change detection. Secondly, we introduce a novel architecture called VFM-ReSCD. This architecture integrates a side adapter (SA) into the Fast Segment Anything Model (FastSAM) network, enabling zero-shot transfer to novel image distributions and tasks. Additionally, we incorporate a Recurrent Neural Network (RNN) to model semantic correlations and capture feature changes in Very High-Resolution (VHR) RSIs. Thirdly, we propose the Radio-Mamba architecture for change detection in VHR RSIs. This method uses the Radio encoder-decoder for segmentation and feature extraction, enhanced by the LoRA (Low-Rank Adaptation) technique to improve feature extraction for small targets. The Mamba block captures long-distance semantic dependencies and contextual information, significantly improving the accuracy of change detection. The effectiveness of these methods is validated through ablation studies and comparative experiments on benchmark datasets. Our proposed approaches achieve state-of-the-art (SOTA) performance, demonstrating significant improvements in both overall accuracy and semantic classification.
Semantic Change Detection in multitemporal Remote Sensing Images Using Deep Neural Networks / Zhang, Jing. - (2025 Jul 18), pp. 1-81. [10.15168/11572_459230]
Semantic Change Detection in multitemporal Remote Sensing Images Using Deep Neural Networks
Zhang, Jing
2025-07-18
Abstract
Semantic Change Detection (SCD), traditionally referred to as "detection of land cover transition" or "multi-class change detection," has shown remarkable potential with the application of Deep Learning (DL) methods. Unlike standard change detection (CD) tasks, SCD requires the extraction of richer information, including not only the detection of changes in a certain region but also the categorization. This detailed information is crucial for many land-cover and land-use (LCLU) applications. However, existing SCD methods still face challenges such as a significant number of false alarms and omissions, the presence of insufficient annotated datasets, and low overall accuracy. Recent studies have made progress in addressing these challenges by leveraging advanced DL architectures and techniques. For example, the hierarchical semantic graph interaction network (HGINet) uses graph learning to model interactions among different feature layers, improving detection capabilities in complex SCD scenarios. Another approach, Semantic-CD, incorporates open-vocabulary semantics from Vision Foundation Models (VFMs) like CLIP to enhance generalization across semantic categories. These innovations demonstrate the potential of DL to significantly improve SCD performance. In this thesis, we address the above-mentioned challenges by proposing three novel methods that leverage joint spatio-temporal modeling, recurrent semantic change detection, and state space modeling of multi-temporal semantics. Firstly, we propose SCanFormer to explicitly model the "from-to" semantic transitions between bi-temporal remote sensing images (RSIs). This method enhances the context representations of extracted features by leveraging spatio-temporal constraints, leading to more accurate semantic change detection. Secondly, we introduce a novel architecture called VFM-ReSCD. This architecture integrates a side adapter (SA) into the Fast Segment Anything Model (FastSAM) network, enabling zero-shot transfer to novel image distributions and tasks. Additionally, we incorporate a Recurrent Neural Network (RNN) to model semantic correlations and capture feature changes in Very High-Resolution (VHR) RSIs. Thirdly, we propose the Radio-Mamba architecture for change detection in VHR RSIs. This method uses the Radio encoder-decoder for segmentation and feature extraction, enhanced by the LoRA (Low-Rank Adaptation) technique to improve feature extraction for small targets. The Mamba block captures long-distance semantic dependencies and contextual information, significantly improving the accuracy of change detection. The effectiveness of these methods is validated through ablation studies and comparative experiments on benchmark datasets. Our proposed approaches achieve state-of-the-art (SOTA) performance, demonstrating significant improvements in both overall accuracy and semantic classification.| File | Dimensione | Formato | |
|---|---|---|---|
|
PhD_thesis_Jing_Zhang.pdf
accesso aperto
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
48.22 MB
Formato
Adobe PDF
|
48.22 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



