TY - JOUR
T1 - CGSANet: A Contour-Guided and Local Structure-Aware Encoder-Decoder Network for Accurate Building Extraction from Very High-Resolution Remote Sensing Imagery
AU - Chen, Shanxiong
AU - Shi, Wenzhong
AU - Zhou, Mingting
AU - Zhang, Min
AU - Xuan, Zhaoxin
N1 - Funding Information:
This work was supported in part by The Hong Kong Polytechnic University under Grants 1-ZVN6, ZVU1, and 4-BCF7, in part by Hong Kong Innovation, and Technology Commission under Grants SST/051/20GP, and in part by the BeijingKey Laboratory of Urban Spatial Information Engineering under Grants 2020101.
Publisher Copyright:
© 2008-2012 IEEE.
PY - 2021/12
Y1 - 2021/12
N2 - Extracting buildings accurately from very high-resolution (VHR) remote sensing imagery is challenging due to diverse building appearances, spectral variability, and complex background in VHR remote sensing images. Recent studies mainly adopt a variant of the fully convolutional network (FCN) with an encoder-decoder architecture to extract buildings, which has shown promising improvement over conventional methods. However, FCN-based encoder-decoder models still fail to fully utilize the implicit characteristics of building shapes. This adversely affects the accurate localization of building boundaries, which is particularly relevant in building mapping. A contour-guided and local structure-aware encoder-decoder network (CGSANet) is proposed to extract buildings with more accurate boundaries. CGSANet is a multitask network composed of a contour-guided (CG) and a multiregion-guided (MRG) module. The CG module is supervised by a building contour that effectively learns building contour-related spatial features to retain the shape pattern of buildings. The MRG module is deeply supervised by four building regions that further capture multiscale and contextual features of buildings. In addition, a hybrid loss function was designed to improve the structure learning ability of CGSANet. These three improvements benefit each other synergistically to produce high-quality building extraction results. Experimental results on the WHU and NZ32km2 building datasets demonstrate that compared with the tested algorithms, CGSANet can produce more accurate building extraction results and achieve the best intersection over union value 91.55% and 90.02%, respectively. Experiments on the INRIA building dataset further demonstrate the ability for generalization of the proposed framework, indicating great practical potential.
AB - Extracting buildings accurately from very high-resolution (VHR) remote sensing imagery is challenging due to diverse building appearances, spectral variability, and complex background in VHR remote sensing images. Recent studies mainly adopt a variant of the fully convolutional network (FCN) with an encoder-decoder architecture to extract buildings, which has shown promising improvement over conventional methods. However, FCN-based encoder-decoder models still fail to fully utilize the implicit characteristics of building shapes. This adversely affects the accurate localization of building boundaries, which is particularly relevant in building mapping. A contour-guided and local structure-aware encoder-decoder network (CGSANet) is proposed to extract buildings with more accurate boundaries. CGSANet is a multitask network composed of a contour-guided (CG) and a multiregion-guided (MRG) module. The CG module is supervised by a building contour that effectively learns building contour-related spatial features to retain the shape pattern of buildings. The MRG module is deeply supervised by four building regions that further capture multiscale and contextual features of buildings. In addition, a hybrid loss function was designed to improve the structure learning ability of CGSANet. These three improvements benefit each other synergistically to produce high-quality building extraction results. Experimental results on the WHU and NZ32km2 building datasets demonstrate that compared with the tested algorithms, CGSANet can produce more accurate building extraction results and achieve the best intersection over union value 91.55% and 90.02%, respectively. Experiments on the INRIA building dataset further demonstrate the ability for generalization of the proposed framework, indicating great practical potential.
KW - Building extraction
KW - fully convolutional network (FCN)
KW - hybrid loss function
KW - multitask learning
KW - very high resolution (VHR) remote sensing imagery
UR - http://www.scopus.com/inward/record.url?scp=85122291478&partnerID=8YFLogxK
U2 - 10.1109/JSTARS.2021.3139017
DO - 10.1109/JSTARS.2021.3139017
M3 - Journal article
AN - SCOPUS:85122291478
SN - 1939-1404
VL - 15
SP - 1526
EP - 1542
JO - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
JF - IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
ER -