With the development of Earth observation technologies, there is a tremendous increase in the volume of available remote sensing images (RSIs), and subsequently a growing need for the automatic analysis of the collected data. The pixel-wise classification, i.e., the semantic segmentation of RSIs, is important for a variety of land-cover and land-use mapping applications. Recent studies on the semantic segmentation of RSIs have achieved great progress with the use of Convolutional Neural Networks (CNNs). However, they suffer from some common problems such as fragmentation errors, boundary ambiguity, and the need for optimization of the results. In this thesis, we address these problems and propose methods to improve the segmentation accuracy in the context of i) The semantic segmentation of very high resolution (VHR) RSIs; ii) The semantic segmentation of High-Resolution (HR) Synthetic Aperture Radar (SAR) images; iii) The segmentation of roads in VHR RSIs; iv) The segmentation of buildings in VHR RSIs. Through research activities conducted under these sub-topics, this dissertation presents four novel contributions. First, we propose a Local Attention Network (LANet) for the semantic segmentation of VHR RSIs. Conventional CNN models extract features within a limited Receptive Field (RF) due to their local information aggregation mechanism. In the proposed LANet, we design a patch attention module to enhance the embedding of context information, as well as an attention embedding module to enrich the semantic information in low-level features. Experimental results show that these designs reduce fragmentation errors and improve segmentation accuracy. Second, we present a novel CNN architecture for the semantic segmentation of HR SAR images. SAR images contain intense speckle noise which affects the segmentation algorithms. To alleviate its impact, we design a Multi-Path Residual Network (MPResNet) that contains three parallel feature embedding branches. Compared to other CNN architectures, it has wider RF, thus being able to exploit better the local discriminative features. Third, we propose a Direction-aware Residual Network (DiResNet) for the segmentation of roads in VHR RSIs. State-of-the-art methods for road segmentation suffer from discontinuity problems (affected by occlusions and redundant spatial information). In the DiResNet we introduce the supervision of road directions to improve the detection of linear features, as well as several auxiliary designs to improve the road structure and completeness. These lead to significant improvements in precision and connectivity of the results. Last, we introduce an adversarial training strategy to model the shape information for building segmentation in VHR RSIs. Common CNNs cannot model the shape of objects of interest. We propose an Adversarial Shape Learning Network (ASLNet) to learn explicitly the shape constraints that data exhibit, which is beneficial for inpainting the missing building parts and regularizing the building contours. This approach improves the results in both pixel-based accuracy and object-based metrics. The effectiveness of the proposed approaches has been tested with both ablation studies and comparative experiments on the corresponding benchmark datasets. The quantitative and qualitative results are presented together with a comprehensive performance analysis.
Novel methods for the Semantic Segmentation of Remote Sensing Images / Ding, Lei. - (2021 Nov 26), pp. 1-128. [10.15168/11572_322099]
Novel methods for the Semantic Segmentation of Remote Sensing Images
Ding, Lei
2021-11-26
Abstract
With the development of Earth observation technologies, there is a tremendous increase in the volume of available remote sensing images (RSIs), and subsequently a growing need for the automatic analysis of the collected data. The pixel-wise classification, i.e., the semantic segmentation of RSIs, is important for a variety of land-cover and land-use mapping applications. Recent studies on the semantic segmentation of RSIs have achieved great progress with the use of Convolutional Neural Networks (CNNs). However, they suffer from some common problems such as fragmentation errors, boundary ambiguity, and the need for optimization of the results. In this thesis, we address these problems and propose methods to improve the segmentation accuracy in the context of i) The semantic segmentation of very high resolution (VHR) RSIs; ii) The semantic segmentation of High-Resolution (HR) Synthetic Aperture Radar (SAR) images; iii) The segmentation of roads in VHR RSIs; iv) The segmentation of buildings in VHR RSIs. Through research activities conducted under these sub-topics, this dissertation presents four novel contributions. First, we propose a Local Attention Network (LANet) for the semantic segmentation of VHR RSIs. Conventional CNN models extract features within a limited Receptive Field (RF) due to their local information aggregation mechanism. In the proposed LANet, we design a patch attention module to enhance the embedding of context information, as well as an attention embedding module to enrich the semantic information in low-level features. Experimental results show that these designs reduce fragmentation errors and improve segmentation accuracy. Second, we present a novel CNN architecture for the semantic segmentation of HR SAR images. SAR images contain intense speckle noise which affects the segmentation algorithms. To alleviate its impact, we design a Multi-Path Residual Network (MPResNet) that contains three parallel feature embedding branches. Compared to other CNN architectures, it has wider RF, thus being able to exploit better the local discriminative features. Third, we propose a Direction-aware Residual Network (DiResNet) for the segmentation of roads in VHR RSIs. State-of-the-art methods for road segmentation suffer from discontinuity problems (affected by occlusions and redundant spatial information). In the DiResNet we introduce the supervision of road directions to improve the detection of linear features, as well as several auxiliary designs to improve the road structure and completeness. These lead to significant improvements in precision and connectivity of the results. Last, we introduce an adversarial training strategy to model the shape information for building segmentation in VHR RSIs. Common CNNs cannot model the shape of objects of interest. We propose an Adversarial Shape Learning Network (ASLNet) to learn explicitly the shape constraints that data exhibit, which is beneficial for inpainting the missing building parts and regularizing the building contours. This approach improves the results in both pixel-based accuracy and object-based metrics. The effectiveness of the proposed approaches has been tested with both ablation studies and comparative experiments on the corresponding benchmark datasets. The quantitative and qualitative results are presented together with a comprehensive performance analysis.File | Dimensione | Formato | |
---|---|---|---|
LeiDing_PhD_Thesis.pdf
accesso aperto
Descrizione: Main article
Tipologia:
Tesi di dottorato (Doctoral Thesis)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
30.77 MB
Formato
Adobe PDF
|
30.77 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione