TY - GEN
T1 - Learning a real-time generic tracker using convolutional neural networks
AU - Zhu, Linnan
AU - Yang, Lingxiao
AU - Zhang, David
AU - Zhang, Lei
PY - 2017/8/28
Y1 - 2017/8/28
N2 - This paper presents a novel frame-pair based method for visual object tracking. Instead of adopting two-stream Convolutional Neural Networks (CNNs) to represent each frame, we stack frame pairs as the input, resulting in a single-stream CNN tracker with much fewer parameters. The proposed tracker can learn generic motion patterns of objects with much less annotated videos than previous methods. Besides, it is found that trackers trained using two successive frames tend to predict the centers of searching windows as the locations of tracked targets. To alleviate this problem, we propose a novel sampling strategy for off-line training. Specifically, we construct a pair by sampling two frames with a random offset. The offset controls the moving smoothness of objects. Experiments on the challenging VOT14 and OTB datasets show that the proposed tracker performs on par with recently developed generic trackers, but with much less memory. In addition, our tracker can run in a speed of over 100 (30) fps with a GPU (CPU), much faster than most deep neural network based trackers.
AB - This paper presents a novel frame-pair based method for visual object tracking. Instead of adopting two-stream Convolutional Neural Networks (CNNs) to represent each frame, we stack frame pairs as the input, resulting in a single-stream CNN tracker with much fewer parameters. The proposed tracker can learn generic motion patterns of objects with much less annotated videos than previous methods. Besides, it is found that trackers trained using two successive frames tend to predict the centers of searching windows as the locations of tracked targets. To alleviate this problem, we propose a novel sampling strategy for off-line training. Specifically, we construct a pair by sampling two frames with a random offset. The offset controls the moving smoothness of objects. Experiments on the challenging VOT14 and OTB datasets show that the proposed tracker performs on par with recently developed generic trackers, but with much less memory. In addition, our tracker can run in a speed of over 100 (30) fps with a GPU (CPU), much faster than most deep neural network based trackers.
KW - Convolutional neural networks
KW - Generic object tracker
KW - Real-time tracking
KW - Single-target tracking
UR - http://www.scopus.com/inward/record.url?scp=85030230497&partnerID=8YFLogxK
U2 - 10.1109/ICME.2017.8019381
DO - 10.1109/ICME.2017.8019381
M3 - Conference article published in proceeding or book
AN - SCOPUS:85030230497
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 1219
EP - 1224
BT - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
PB - IEEE Computer Society
T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017
Y2 - 10 July 2017 through 14 July 2017
ER -