TY - JOUR
T1 - Mapping essential urban land use categories with open big data
T2 - Results for five metropolitan areas in the United States of America
AU - Chen, Bin
AU - Tu, Ying
AU - Song, Yimeng
AU - Theobald, David M.
AU - Zhang, Tao
AU - Ren, Zhehao
AU - Li, Xuecao
AU - Yang, Jun
AU - Wang, Jie
AU - Wang, Xi
AU - Gong, Peng
AU - Bai, Yuqi
AU - Xu, Bing
N1 - Funding Information:
This work was supported by the Major Program of the National Natural Science Foundation of China (20201321441 and 20201320003), The University of Hong Kong HKU-100 Scholars Fund, and donations made by the Cyrus Tang Foundation to Tsinghua University.
Publisher Copyright:
© 2021 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)
PY - 2021/8
Y1 - 2021/8
N2 - Urban land-use maps outlining the distribution, pattern, and composition of various land use types are critically important for urban planning, environmental management, disaster control, health protection, and biodiversity conservation. Recent advances in remote sensing and social sensing data and methods have shown great potentials in mapping urban land use categories, but they are still constrained by mixed land uses, limited predictors, non-localized models, and often relatively low accuracies. To inform these issues, we proposed a robust and cost-effective framework for mapping urban land use categories using openly available multi-source geospatial “big data”. With street blocks generated from OpenStreetMap (OSM) data as the minimum classification unit, we integrated an expansive set of multi-scale spatially explicit information on land surface, vertical height, socio-economic attributes, social media, demography, and topography. We further proposed to apply the automatic ensemble learning that leverages a bunch of machine learning algorithms in deriving optimal urban land use classification maps. Results of block-level urban land use classification in five metropolitan areas of the United States found the overall accuracies of major-class (Level-I) and minor-class (Level-II) classification could be high as 91% and 86%, respectively. A multi-model comparison revealed that for urban land use classification with high-dimensional features, the multi-layer stacking ensemble models achieved better performance than base models such as random forest, extremely randomized trees, LightGBM, CatBoost, and neural networks. We found without very-high-resolution National Agriculture Imagery Program imagery, the classification results derived from Sentinel-1, Sentinel-2, and other open big data based features could achieve plausible overall accuracies of Level-I and Level-II classification at 88% and 81%, respectively. We also found that model transferability depended highly on the heterogeneity in characteristics of different regions. The methods and findings in this study systematically elucidate the role of data sources, classification methods, and feature transferability in block-level land use classifications, which have important implications for mapping multi-scale essential urban land use categories.
AB - Urban land-use maps outlining the distribution, pattern, and composition of various land use types are critically important for urban planning, environmental management, disaster control, health protection, and biodiversity conservation. Recent advances in remote sensing and social sensing data and methods have shown great potentials in mapping urban land use categories, but they are still constrained by mixed land uses, limited predictors, non-localized models, and often relatively low accuracies. To inform these issues, we proposed a robust and cost-effective framework for mapping urban land use categories using openly available multi-source geospatial “big data”. With street blocks generated from OpenStreetMap (OSM) data as the minimum classification unit, we integrated an expansive set of multi-scale spatially explicit information on land surface, vertical height, socio-economic attributes, social media, demography, and topography. We further proposed to apply the automatic ensemble learning that leverages a bunch of machine learning algorithms in deriving optimal urban land use classification maps. Results of block-level urban land use classification in five metropolitan areas of the United States found the overall accuracies of major-class (Level-I) and minor-class (Level-II) classification could be high as 91% and 86%, respectively. A multi-model comparison revealed that for urban land use classification with high-dimensional features, the multi-layer stacking ensemble models achieved better performance than base models such as random forest, extremely randomized trees, LightGBM, CatBoost, and neural networks. We found without very-high-resolution National Agriculture Imagery Program imagery, the classification results derived from Sentinel-1, Sentinel-2, and other open big data based features could achieve plausible overall accuracies of Level-I and Level-II classification at 88% and 81%, respectively. We also found that model transferability depended highly on the heterogeneity in characteristics of different regions. The methods and findings in this study systematically elucidate the role of data sources, classification methods, and feature transferability in block-level land use classifications, which have important implications for mapping multi-scale essential urban land use categories.
KW - Block-level mapping
KW - Ensemble learning
KW - Geospatial big data
KW - Land use classification
KW - NAIP
KW - Sentinel-1/2
UR - http://www.scopus.com/inward/record.url?scp=85108643595&partnerID=8YFLogxK
U2 - 10.1016/j.isprsjprs.2021.06.010
DO - 10.1016/j.isprsjprs.2021.06.010
M3 - Journal article
AN - SCOPUS:85108643595
SN - 0924-2716
VL - 178
SP - 203
EP - 218
JO - ISPRS Journal of Photogrammetry and Remote Sensing
JF - ISPRS Journal of Photogrammetry and Remote Sensing
ER -