TY - JOUR
T1 - Multi-LoRA Fine-Tuned Segment Anything Model for Urban Man-Made Object Extraction
AU - Weng, Qihao
AU - Lu, Xiaoyan
PY - 2024/8/15
Y1 - 2024/8/15
N2 - Mapping urban man-made objects, such as roads and buildings, from high-resolution remote sensing imagery is an essential need for monitoring global urbanization. However, the generalization ability of most existing models is limited due to the inconsistent data distribution of images across different regions. The emergence of the segment anything model (SAM) has significantly advanced image segmentation, primarily attributed to its strong zero-shot segmentation ability. However, SAM tends to underperform in various remote sensing tasks, such as road and building extraction, primarily due to the complexity of remote sensing imagery. This article introduced the multi-LoRA fine-tuned SAM (SAM-MLoRAF) framework, a simple yet effective network designed to extract urban man-made objects, which injected multiple parallel low-rank LoRA structures into the SAM encoder to approximate a high-rank LoRA, effectively mitigating the overfitting problem. In addition, it adopted a pyramid decoder to integrate multilevel information. For model optimization, supervised and unsupervised fine-tuning strategies were employed. Initially, the SAM-MLoRA was trained on publicly available datasets in a supervised manner to adapt to the task of urban man-made object extraction. In the second step, based on the idea of consistency regularization, unsupervised fine-tuning was employed to adapt the model to the target region by leveraging unlabeled images from the target region. Extensive experiments conducted on five continents have demonstrated that the proposed SAM-MLoRAF framework can efficiently leverage the robust segmentation capabilities of the SAM foundation model with a few trainable parameters, and most intersections over union (IoUs) of the mapping performance improved by over 10% compared to previous segmentation models. The code and datasets will be released at: https://github.com/xiaoyan07/SAM-MLoRA.
AB - Mapping urban man-made objects, such as roads and buildings, from high-resolution remote sensing imagery is an essential need for monitoring global urbanization. However, the generalization ability of most existing models is limited due to the inconsistent data distribution of images across different regions. The emergence of the segment anything model (SAM) has significantly advanced image segmentation, primarily attributed to its strong zero-shot segmentation ability. However, SAM tends to underperform in various remote sensing tasks, such as road and building extraction, primarily due to the complexity of remote sensing imagery. This article introduced the multi-LoRA fine-tuned SAM (SAM-MLoRAF) framework, a simple yet effective network designed to extract urban man-made objects, which injected multiple parallel low-rank LoRA structures into the SAM encoder to approximate a high-rank LoRA, effectively mitigating the overfitting problem. In addition, it adopted a pyramid decoder to integrate multilevel information. For model optimization, supervised and unsupervised fine-tuning strategies were employed. Initially, the SAM-MLoRA was trained on publicly available datasets in a supervised manner to adapt to the task of urban man-made object extraction. In the second step, based on the idea of consistency regularization, unsupervised fine-tuning was employed to adapt the model to the target region by leveraging unlabeled images from the target region. Extensive experiments conducted on five continents have demonstrated that the proposed SAM-MLoRAF framework can efficiently leverage the robust segmentation capabilities of the SAM foundation model with a few trainable parameters, and most intersections over union (IoUs) of the mapping performance improved by over 10% compared to previous segmentation models. The code and datasets will be released at: https://github.com/xiaoyan07/SAM-MLoRA.
KW - High-resolution satellite imagery
KW - man-made objects
KW - segment anything model (SAM)
KW - unsupervised domain adaptation (UDA)
KW - urban areas
UR - https://www.scopus.com/pages/publications/85201632518
U2 - 10.1109/TGRS.2024.3435745
DO - 10.1109/TGRS.2024.3435745
M3 - Journal article
AN - SCOPUS:85201632518
SN - 0196-2892
VL - 62
JO - IEEE Transactions on Geoscience and Remote Sensing
JF - IEEE Transactions on Geoscience and Remote Sensing
ER -