TY - GEN
T1 - Angle Tokenization Guided Multi-Scale Vision Transformer for Oriented Object Detection in Remote Sensing Imagery
AU - Zhang, Cong
AU - Liu, Tianshan
AU - Lam, Kin Man
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022/7
Y1 - 2022/7
N2 - In this paper, an angle tokenization guided multi-scale Trans-former framework is proposed for oriented object detection in remote sensing images. Different from existing detectors that are based on convolutional neural networks (CNNs), our proposed method is based on a pyramid Transformer architecture with a compact and flexible angle tokenization module (ATM) to efficiently learn the orientation knowledge for rotated geospatial objects. The Transformer structure can progressively render long-range dependencies and multi-scale spatial details required for accurate localization, while the ATM provides robust guidance on feature refinement for angle prediction, jointly achieving end-to-end orientation de-tection. To the best of our knowledge, this is the first work to adapt Vision Transformers to remote sensing oriented object detection. Experimental results demonstrate the effectiveness and superiority of our method.
AB - In this paper, an angle tokenization guided multi-scale Trans-former framework is proposed for oriented object detection in remote sensing images. Different from existing detectors that are based on convolutional neural networks (CNNs), our proposed method is based on a pyramid Transformer architecture with a compact and flexible angle tokenization module (ATM) to efficiently learn the orientation knowledge for rotated geospatial objects. The Transformer structure can progressively render long-range dependencies and multi-scale spatial details required for accurate localization, while the ATM provides robust guidance on feature refinement for angle prediction, jointly achieving end-to-end orientation de-tection. To the best of our knowledge, this is the first work to adapt Vision Transformers to remote sensing oriented object detection. Experimental results demonstrate the effectiveness and superiority of our method.
KW - angle tokenization
KW - multi-head attention
KW - multi-scale vision Transform-ers
KW - oriented object detection
KW - Remote sensing images
UR - http://www.scopus.com/inward/record.url?scp=85140380371&partnerID=8YFLogxK
U2 - 10.1109/IGARSS46834.2022.9883662
DO - 10.1109/IGARSS46834.2022.9883662
M3 - Conference article published in proceeding or book
AN - SCOPUS:85140380371
T3 - International Geoscience and Remote Sensing Symposium (IGARSS)
SP - 3063
EP - 3066
BT - IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2022 IEEE International Geoscience and Remote Sensing Symposium, IGARSS 2022
Y2 - 17 July 2022 through 22 July 2022
ER -