TY - JOUR
T1 - SINet
T2 - A Scale-Insensitive Convolutional Neural Network for Fast Vehicle Detection
AU - Hu, Xiaowei
AU - Xu, Xuemiao
AU - Xiao, Yongjie
AU - Chen, Hao
AU - He, Shengfeng
AU - Qin, Jing
AU - Heng, Pheng Ann
N1 - Funding Information:
Manuscript received July 14, 2017; revised January 1, 2018 and April 2, 2018; accepted May 9, 2018. Date of publication October 1, 2018; date of current version February 28, 2019. This work was supported in part by NSFC under Grant 61772206, Grant U1611461, Grant 61472145, and Grant 61702194, in part by the Special Fund of Science and Technology Research and Development on Application from Guangdong Province under Grant 2016B010124011 and Grant 2016B010127003, in part by the Guangdong High-level Personnel of Special Support Program under Grant 2016TQ03X319, in part by the Guangdong Natural Science Foundation under Grant 2017A030311027 and Grant 2017A030312008, in part by the Major Project in Industrial Technology in Guangzhou under Grant 2018-0601-ZB-0271, and in part by The Hong Kong Polytechnic University under Project 1-ZE8J. The work of X. Hu was supported by the Hong Kong Ph.D. Fellowship. The Associate Editor for this paper was Z. Duric. (Corresponding author: Xuemiao Xu.) X. Hu was with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China. He is now with the Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.
Publisher Copyright:
© 2000-2011 IEEE.
PY - 2019/3
Y1 - 2019/3
N2 - Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN-based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects and 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.
AB - Vision-based vehicle detection approaches achieve incredible success in recent years with the development of deep convolutional neural network (CNN). However, existing CNN-based algorithms suffer from the problem that the convolutional features are scale-sensitive in object detection task but it is common that traffic images and videos contain vehicles with a large variance of scales. In this paper, we delve into the source of scale sensitivity, and reveal two key issues: 1) existing RoI pooling destroys the structure of small scale objects and 2) the large intra-class distance for a large variance of scales exceeds the representation capability of a single network. Based on these findings, we present a scale-insensitive convolutional neural network (SINet) for fast detecting vehicles with a large variance of scales. First, we present a context-aware RoI pooling to maintain the contextual information and original structure of small scale objects. Second, we present a multi-branch decision network to minimize the intra-class distance of features. These lightweight techniques bring zero extra time complexity but prominent detection accuracy improvement. The proposed techniques can be equipped with any deep network architectures and keep them trained end-to-end. Our SINet achieves state-of-the-art performance in terms of accuracy and speed (up to 37 FPS) on the KITTI benchmark and a new highway dataset, which contains a large variance of scales and extremely small objects.
KW - fast object detection
KW - intelligent transportation system
KW - scale sensitivity
KW - Vehicle detection
UR - http://www.scopus.com/inward/record.url?scp=85054358459&partnerID=8YFLogxK
U2 - 10.1109/TITS.2018.2838132
DO - 10.1109/TITS.2018.2838132
M3 - Journal article
AN - SCOPUS:85054358459
SN - 1524-9050
VL - 20
SP - 1010
EP - 1019
JO - IEEE Transactions on Intelligent Transportation Systems
JF - IEEE Transactions on Intelligent Transportation Systems
IS - 3
M1 - 8478157
ER -