Novel methods for the Semantic Segmentation of Remote Sensing Images

Ding, Lei

doi:10.15168/11572_322099

With the development of Earth observation technologies, there is a tremendous increase in the volume of available remote sensing images (RSIs), and subsequently a growing need for the automatic analysis of the collected data. The pixel-wise classification, i.e., the semantic segmentation of RSIs, is important for a variety of land-cover and land-use mapping applications. Recent studies on the semantic segmentation of RSIs have achieved great progress with the use of Convolutional Neural Networks (CNNs). However, they suffer from some common problems such as fragmentation errors, boundary ambiguity, and the need for optimization of the results. In this thesis, we address these problems and propose methods to improve the segmentation accuracy in the context of i) The semantic segmentation of very high resolution (VHR) RSIs; ii) The semantic segmentation of High-Resolution (HR) Synthetic Aperture Radar (SAR) images; iii) The segmentation of roads in VHR RSIs; iv) The segmentation of buildings in VHR RSIs. Through research activities conducted under these sub-topics, this dissertation presents four novel contributions. First, we propose a Local Attention Network (LANet) for the semantic segmentation of VHR RSIs. Conventional CNN models extract features within a limited Receptive Field (RF) due to their local information aggregation mechanism. In the proposed LANet, we design a patch attention module to enhance the embedding of context information, as well as an attention embedding module to enrich the semantic information in low-level features. Experimental results show that these designs reduce fragmentation errors and improve segmentation accuracy. Second, we present a novel CNN architecture for the semantic segmentation of HR SAR images. SAR images contain intense speckle noise which affects the segmentation algorithms. To alleviate its impact, we design a Multi-Path Residual Network (MPResNet) that contains three parallel feature embedding branches. Compared to other CNN architectures, it has wider RF, thus being able to exploit better the local discriminative features. Third, we propose a Direction-aware Residual Network (DiResNet) for the segmentation of roads in VHR RSIs. State-of-the-art methods for road segmentation suffer from discontinuity problems (affected by occlusions and redundant spatial information). In the DiResNet we introduce the supervision of road directions to improve the detection of linear features, as well as several auxiliary designs to improve the road structure and completeness. These lead to significant improvements in precision and connectivity of the results. Last, we introduce an adversarial training strategy to model the shape information for building segmentation in VHR RSIs. Common CNNs cannot model the shape of objects of interest. We propose an Adversarial Shape Learning Network (ASLNet) to learn explicitly the shape constraints that data exhibit, which is beneficial for inpainting the missing building parts and regularizing the building contours. This approach improves the results in both pixel-based accuracy and object-based metrics. The effectiveness of the proposed approaches has been tested with both ablation studies and comparative experiments on the corresponding benchmark datasets. The quantitative and qualitative results are presented together with a comprehensive performance analysis.

Novel methods for the Semantic Segmentation of Remote Sensing Images / Ding, Lei. - (2021 Nov 26), pp. 1-128. [10.15168/11572_322099]

Novel methods for the Semantic Segmentation of Remote Sensing Images

Ding, Lei

2021-11-26

Abstract

With the development of Earth observation technologies, there is a tremendous increase in the volume of available remote sensing images (RSIs), and subsequently a growing need for the automatic analysis of the collected data. The pixel-wise classification, i.e., the semantic segmentation of RSIs, is important for a variety of land-cover and land-use mapping applications. Recent studies on the semantic segmentation of RSIs have achieved great progress with the use of Convolutional Neural Networks (CNNs). However, they suffer from some common problems such as fragmentation errors, boundary ambiguity, and the need for optimization of the results. In this thesis, we address these problems and propose methods to improve the segmentation accuracy in the context of i) The semantic segmentation of very high resolution (VHR) RSIs; ii) The semantic segmentation of High-Resolution (HR) Synthetic Aperture Radar (SAR) images; iii) The segmentation of roads in VHR RSIs; iv) The segmentation of buildings in VHR RSIs. Through research activities conducted under these sub-topics, this dissertation presents four novel contributions. First, we propose a Local Attention Network (LANet) for the semantic segmentation of VHR RSIs. Conventional CNN models extract features within a limited Receptive Field (RF) due to their local information aggregation mechanism. In the proposed LANet, we design a patch attention module to enhance the embedding of context information, as well as an attention embedding module to enrich the semantic information in low-level features. Experimental results show that these designs reduce fragmentation errors and improve segmentation accuracy. Second, we present a novel CNN architecture for the semantic segmentation of HR SAR images. SAR images contain intense speckle noise which affects the segmentation algorithms. To alleviate its impact, we design a Multi-Path Residual Network (MPResNet) that contains three parallel feature embedding branches. Compared to other CNN architectures, it has wider RF, thus being able to exploit better the local discriminative features. Third, we propose a Direction-aware Residual Network (DiResNet) for the segmentation of roads in VHR RSIs. State-of-the-art methods for road segmentation suffer from discontinuity problems (affected by occlusions and redundant spatial information). In the DiResNet we introduce the supervision of road directions to improve the detection of linear features, as well as several auxiliary designs to improve the road structure and completeness. These lead to significant improvements in precision and connectivity of the results. Last, we introduce an adversarial training strategy to model the shape information for building segmentation in VHR RSIs. Common CNNs cannot model the shape of objects of interest. We propose an Adversarial Shape Learning Network (ASLNet) to learn explicitly the shape constraints that data exhibit, which is beneficial for inpainting the missing building parts and regularizing the building contours. This approach improves the results in both pixel-based accuracy and object-based metrics. The effectiveness of the proposed approaches has been tested with both ablation studies and comparative experiments on the corresponding benchmark datasets. The quantitative and qualitative results are presented together with a comprehensive performance analysis.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				26-nov-2021
			
	Ciclo
	
				XXXIII
			
	Anno Accademico
	
				2019-2020
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Bruzzone, Lorenzo
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_322099
			
	Lingua (Language)
	
				Inglese
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
LeiDing_PhD_Thesis.pdf accesso aperto Descrizione: Main article Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 30.77 MB Formato Adobe PDF Visualizza/Apri	30.77 MB	Adobe PDF	Visualizza/Apri