TY - GEN
T1 - Pyramid Masked Image Modeling for Transformer-Based Aerial Object Detection
AU - Zhang, Cong
AU - Liu, Tianshan
AU - Ju, Yakun
AU - Lam, Kin Man
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/9/11
Y1 - 2023/9/11
N2 - Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
AB - Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.
KW - Aerial Object Detection
KW - Masked Image Modeling
KW - Pyramid Architecture
KW - Self-Supervised Learning
KW - Vision Transformer
UR - http://www.scopus.com/inward/record.url?scp=85180739517&partnerID=8YFLogxK
U2 - 10.1109/ICIP49359.2023.10223093
DO - 10.1109/ICIP49359.2023.10223093
M3 - Conference article published in proceeding or book
AN - SCOPUS:85180739517
T3 - Proceedings - International Conference on Image Processing, ICIP
SP - 1675
EP - 1679
BT - 2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PB - IEEE Computer Society
T2 - 30th IEEE International Conference on Image Processing, ICIP 2023
Y2 - 8 October 2023 through 11 October 2023
ER -