TY - GEN
T1 - Deep location-specific tracking
AU - Yang, Lingxiao
AU - Liu, Risheng
AU - Zhang, David
AU - Zhang, Lei
PY - 2017/10/23
Y1 - 2017/10/23
N2 - Convolutional Neural Network (CNN) based methods have shown significant performance gains in the problem of visual tracking in recent years. Due to many uncertain changes of objects online, such as abrupt motion, background clutter and large deformation, the visual tracking is still a challenging task. We propose a novel algorithm, namely Deep Location-Specific Tracking, which decomposes the tracking problem into a localization task and a classification task, and trains an individual network for each task. The localization network exploits the information in the current frame and provides a specific location to improve the probability of successful tracking, while the classification network finds the target among many examples generated around the target location in the previous frame, as well as the one estimated from the localization network in the current frame. CNN based trackers often have massive number of trainable parameters, and are prone to over-fitting to some particular object states, leading to less precision or tracking drift. We address this problem by learning a classification network based on 1 × 1 convolution and global average pooling. Extensive experimental results on popular benchmark datasets show that the proposed tracker achieves competitive results without using additional tracking videos for fine-tuning. The code is available at https://github.com/ZjjConan/DLST.
AB - Convolutional Neural Network (CNN) based methods have shown significant performance gains in the problem of visual tracking in recent years. Due to many uncertain changes of objects online, such as abrupt motion, background clutter and large deformation, the visual tracking is still a challenging task. We propose a novel algorithm, namely Deep Location-Specific Tracking, which decomposes the tracking problem into a localization task and a classification task, and trains an individual network for each task. The localization network exploits the information in the current frame and provides a specific location to improve the probability of successful tracking, while the classification network finds the target among many examples generated around the target location in the previous frame, as well as the one estimated from the localization network in the current frame. CNN based trackers often have massive number of trainable parameters, and are prone to over-fitting to some particular object states, leading to less precision or tracking drift. We address this problem by learning a classification network based on 1 × 1 convolution and global average pooling. Extensive experimental results on popular benchmark datasets show that the proposed tracker achieves competitive results without using additional tracking videos for fine-tuning. The code is available at https://github.com/ZjjConan/DLST.
KW - Convolutional neural networks
KW - Location specific tracking
KW - Single object tracking
KW - Visual tracking
UR - http://www.scopus.com/inward/record.url?scp=85035190764&partnerID=8YFLogxK
U2 - 10.1145/3123266.3123381
DO - 10.1145/3123266.3123381
M3 - Conference article published in proceeding or book
AN - SCOPUS:85035190764
T3 - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
SP - 1309
EP - 1317
BT - MM 2017 - Proceedings of the 2017 ACM Multimedia Conference
PB - Association for Computing Machinery, Inc
T2 - 25th ACM International Conference on Multimedia, MM 2017
Y2 - 23 October 2017 through 27 October 2017
ER -