TY - GEN
T1 - METFormer: A Motion Enhanced Transformer for Multiple Object Tracking
AU - Gao, Jianjun
AU - Yap, Kim Hui
AU - Wang, Yi
AU - Garg, Kratika
AU - Han, Boon Siew
N1 - Funding Information:
This research is supported by the Agency for Science, Technology and Research (ASTAR) under its IAF-ICP Programme I2001E0067 and the Schaeffler Hub for Advanced Research at NTU
Funding Information:
This research is supported by the Agency for Science, Technology and Research (A*STAR) under its IAF-ICP Programme I2001E0067 and the Schaeffler Hub for Advanced Research at NTU.
Publisher Copyright:
© 2023 IEEE.
PY - 2023/7
Y1 - 2023/7
N2 - Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings.
AB - Multiple object tracking (MOT) is an important task in computer vision, especially video analytics. Transformer-based methods are emerging approaches using both tracking and detection queries. However, motion modeling in existing transformer-based methods lacks effective association capability. Thus, this paper introduces a new METFormer model, a Motion Enhanced TransFormer-based tracker with a novel global-local motion context learning technique to mitigate the lack of motion information in existing transformer-based methods. The global-local motion context learning technique first centers on difference-guided global motion learning to obtain temporal information from adjacent frames. Based on global motion, we leverage context-aware local object motion modelling to study motion patterns and enhance the feature representation for individual objects. Experimental results on the benchmark MOT17 dataset show that our proposed method can surpass the state-of-the-art Trackformer [21] by 1.8% on IDF1 and 21.7% on ID Switches under public detection settings.
KW - Motion Modeling
KW - Multiple Object Tracking
KW - Tracking by Attention
KW - Transformer
UR - http://www.scopus.com/inward/record.url?scp=85167728762&partnerID=8YFLogxK
U2 - 10.1109/ISCAS46773.2023.10182032
DO - 10.1109/ISCAS46773.2023.10182032
M3 - Conference article published in proceeding or book
AN - SCOPUS:85167728762
T3 - Proceedings - IEEE International Symposium on Circuits and Systems
SP - 1
EP - 5
BT - ISCAS 2023 - 56th IEEE International Symposium on Circuits and Systems, Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 56th IEEE International Symposium on Circuits and Systems, ISCAS 2023
Y2 - 21 May 2023 through 25 May 2023
ER -