Semantic Change Detection (SCD), traditionally referred to as "detection of land cover transition" or "multi-class change detection," has shown remarkable potential with the application of Deep Learning (DL) methods. Unlike standard change detection (CD) tasks, SCD requires the extraction of richer information, including not only the detection of changes in a certain region but also the categorization. This detailed information is crucial for many land-cover and land-use (LCLU) applications. However, existing SCD methods still face challenges such as a significant number of false alarms and omissions, the presence of insufficient annotated datasets, and low overall accuracy. Recent studies have made progress in addressing these challenges by leveraging advanced DL architectures and techniques. For example, the hierarchical semantic graph interaction network (HGINet) uses graph learning to model interactions among different feature layers, improving detection capabilities in complex SCD scenarios. Another approach, Semantic-CD, incorporates open-vocabulary semantics from Vision Foundation Models (VFMs) like CLIP to enhance generalization across semantic categories. These innovations demonstrate the potential of DL to significantly improve SCD performance. In this thesis, we address the above-mentioned challenges by proposing three novel methods that leverage joint spatio-temporal modeling, recurrent semantic change detection, and state space modeling of multi-temporal semantics. Firstly, we propose SCanFormer to explicitly model the "from-to" semantic transitions between bi-temporal remote sensing images (RSIs). This method enhances the context representations of extracted features by leveraging spatio-temporal constraints, leading to more accurate semantic change detection. Secondly, we introduce a novel architecture called VFM-ReSCD. This architecture integrates a side adapter (SA) into the Fast Segment Anything Model (FastSAM) network, enabling zero-shot transfer to novel image distributions and tasks. Additionally, we incorporate a Recurrent Neural Network (RNN) to model semantic correlations and capture feature changes in Very High-Resolution (VHR) RSIs. Thirdly, we propose the Radio-Mamba architecture for change detection in VHR RSIs. This method uses the Radio encoder-decoder for segmentation and feature extraction, enhanced by the LoRA (Low-Rank Adaptation) technique to improve feature extraction for small targets. The Mamba block captures long-distance semantic dependencies and contextual information, significantly improving the accuracy of change detection. The effectiveness of these methods is validated through ablation studies and comparative experiments on benchmark datasets. Our proposed approaches achieve state-of-the-art (SOTA) performance, demonstrating significant improvements in both overall accuracy and semantic classification.

Semantic Change Detection in multitemporal Remote Sensing Images Using Deep Neural Networks / Zhang, Jing. - (2025 Jul 18), pp. 1-81. [10.15168/11572_459230]

Semantic Change Detection in multitemporal Remote Sensing Images Using Deep Neural Networks

Zhang, Jing
2025-07-18

Abstract

Semantic Change Detection (SCD), traditionally referred to as "detection of land cover transition" or "multi-class change detection," has shown remarkable potential with the application of Deep Learning (DL) methods. Unlike standard change detection (CD) tasks, SCD requires the extraction of richer information, including not only the detection of changes in a certain region but also the categorization. This detailed information is crucial for many land-cover and land-use (LCLU) applications. However, existing SCD methods still face challenges such as a significant number of false alarms and omissions, the presence of insufficient annotated datasets, and low overall accuracy. Recent studies have made progress in addressing these challenges by leveraging advanced DL architectures and techniques. For example, the hierarchical semantic graph interaction network (HGINet) uses graph learning to model interactions among different feature layers, improving detection capabilities in complex SCD scenarios. Another approach, Semantic-CD, incorporates open-vocabulary semantics from Vision Foundation Models (VFMs) like CLIP to enhance generalization across semantic categories. These innovations demonstrate the potential of DL to significantly improve SCD performance. In this thesis, we address the above-mentioned challenges by proposing three novel methods that leverage joint spatio-temporal modeling, recurrent semantic change detection, and state space modeling of multi-temporal semantics. Firstly, we propose SCanFormer to explicitly model the "from-to" semantic transitions between bi-temporal remote sensing images (RSIs). This method enhances the context representations of extracted features by leveraging spatio-temporal constraints, leading to more accurate semantic change detection. Secondly, we introduce a novel architecture called VFM-ReSCD. This architecture integrates a side adapter (SA) into the Fast Segment Anything Model (FastSAM) network, enabling zero-shot transfer to novel image distributions and tasks. Additionally, we incorporate a Recurrent Neural Network (RNN) to model semantic correlations and capture feature changes in Very High-Resolution (VHR) RSIs. Thirdly, we propose the Radio-Mamba architecture for change detection in VHR RSIs. This method uses the Radio encoder-decoder for segmentation and feature extraction, enhanced by the LoRA (Low-Rank Adaptation) technique to improve feature extraction for small targets. The Mamba block captures long-distance semantic dependencies and contextual information, significantly improving the accuracy of change detection. The effectiveness of these methods is validated through ablation studies and comparative experiments on benchmark datasets. Our proposed approaches achieve state-of-the-art (SOTA) performance, demonstrating significant improvements in both overall accuracy and semantic classification.
18-lug-2025
XXXVII
2024-2025
Università degli Studi di Trento
Information and Communication Technology
no
Inglese
File in questo prodotto:
File Dimensione Formato  
PhD_thesis_Jing_Zhang.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 48.22 MB
Formato Adobe PDF
48.22 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/459230
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact