TY - GEN
T1 - Object Counting in Video Surveillance Using Multi-scale Density Map Regression
AU - Wang, Yi
AU - Hou, Junhui
AU - Chau, Lap Pui
N1 - Funding Information:
This work was supported in part by the Hong Kong RGC Early Career Scheme under Grant 9048123 (CityU 21211518), and in part by the Natural Science Foundation of China under Grant 61873142.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - In this paper, we present an effective convolutional neural network (CNN) for object counting in video surveillance, namely multi-scale density map regressor (MSDMR). In contrast to existing CNN-based methods that achieve high accuracy by means of empirically increasing the model capacity with more complex structures/layers, we focus on a compact CNN. Specifically, the MSDMR is mainly designed with the supervision of multi-scale outputs, in which two CNN stacks estimate coarse- and fine-scale density maps, respectively. The integral of the fine density map provides the count of objects. The two stacks are connected in a cascaded manner and jointly trained such that the overall model can learn discriminative and complementary features to produce expressive performance. Experimental results show that the proposed MSDMR can achieve higher accuracy compared with state-of-the-art methods on the surveillance datasets.
AB - In this paper, we present an effective convolutional neural network (CNN) for object counting in video surveillance, namely multi-scale density map regressor (MSDMR). In contrast to existing CNN-based methods that achieve high accuracy by means of empirically increasing the model capacity with more complex structures/layers, we focus on a compact CNN. Specifically, the MSDMR is mainly designed with the supervision of multi-scale outputs, in which two CNN stacks estimate coarse- and fine-scale density maps, respectively. The integral of the fine density map provides the count of objects. The two stacks are connected in a cascaded manner and jointly trained such that the overall model can learn discriminative and complementary features to produce expressive performance. Experimental results show that the proposed MSDMR can achieve higher accuracy compared with state-of-the-art methods on the surveillance datasets.
KW - CNN
KW - density map
KW - multi-scale
KW - Object counting
KW - video surveillance
UR - http://www.scopus.com/inward/record.url?scp=85068991538&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8683289
DO - 10.1109/ICASSP.2019.8683289
M3 - Conference article published in proceeding or book
AN - SCOPUS:85068991538
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 2422
EP - 2426
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -