Pyramid Masked Image Modeling for Transformer-Based Aerial Object Detection

Cong Zhang, Tianshan Liu, Yakun Ju, Kin Man Lam

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

Original languageEnglish
Title of host publication2023 IEEE International Conference on Image Processing, ICIP 2023 - Proceedings
PublisherIEEE Computer Society
Pages1675-1679
Number of pages5
ISBN (Electronic)9781728198354
DOIs
Publication statusPublished - 11 Sept 2023
Event30th IEEE International Conference on Image Processing, ICIP 2023 - Kuala Lumpur, Malaysia
Duration: 8 Oct 202311 Oct 2023

Publication series

NameProceedings - International Conference on Image Processing, ICIP
ISSN (Print)1522-4880

Conference

Conference30th IEEE International Conference on Image Processing, ICIP 2023
Country/TerritoryMalaysia
CityKuala Lumpur
Period8/10/2311/10/23

Keywords

  • Aerial Object Detection
  • Masked Image Modeling
  • Pyramid Architecture
  • Self-Supervised Learning
  • Vision Transformer

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Signal Processing

Fingerprint

Dive into the research topics of 'Pyramid Masked Image Modeling for Transformer-Based Aerial Object Detection'. Together they form a unique fingerprint.

Cite this