TY - JOUR
T1 - Crowd Counting Via Cross-Stage Refinement Networks
AU - Liu, Yongtuo
AU - Wen, Qiang
AU - Chen, Haoxin
AU - Liu, Wenxi
AU - Qin, Jing
AU - Han, Guoqiang
AU - He, Shengfeng
N1 - Funding Information:
This work was supported in part by the National Natural Science Foundation of China under Grant 61702104, Grant 61472145, Grant 61972162, and Grant 61702194, in part by Hong Kong Polytechnic University under Project YBZE, in part by the Special Fund of Science and Technology Research and Development on Application from Guangdong Province (SFSTRDA-GD) under Grant 2016B010127003, in part by the Guangzhou Key Industrial Technology Research Fund under Grant 201802010036, in part by the Guangdong Natural Science Foundation under Grant 2017A030312008, and in part by the CCF-Tencent Open Research Fund under Grant CCFTencent RAGR20190112. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Ajmal S. Mian. (Corresponding author: Shengfeng He.)
Funding Information:
Manuscript received August 28, 2019; revised March 2, 2020; accepted May 10, 2020. Date of publication May 19, 2020; date of current version July 6, 2020. This work was supported in part by the National Natural Science Foundation of China under Grant 61702104, Grant 61472145, Grant 61972162, and Grant 61702194, in part by Hong Kong Polytechnic University under Project YBZE, in part by the Special Fund of Science and Technology Research and Development on Application from Guangdong Province (SF-STRDA-GD) under Grant 2016B010127003, in part by the Guangzhou Key Industrial Technology Research Fund under Grant 201802010036, in part by the Guangdong Natural Science Foundation under Grant 2017A030312008, and in part by the CCF-Tencent Open Research Fund under Grant CCF-Tencent RAGR20190112. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Ajmal S. Mian. (Corresponding author: Shengfeng He.) Yongtuo Liu, Qiang Wen, Haoxin Chen, Guoqiang Han, and Shengfeng He are with the School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China (e-mail: csmanlyt@ mail.scut.edu.cn; [email protected]; [email protected]; [email protected]; [email protected]).
Publisher Copyright:
© 1992-2012 IEEE.
PY - 2020
Y1 - 2020
N2 - Crowd counting is challenging due to unconstrained imaging factors, e.g., background clutters, non-uniform distribution of people, large scale and perspective variations. Dealing with these problems using deep neural networks requires rich prior knowledge and multi-scale contextual representations. In this paper, we propose a Cross-stage Refinement Network (CRNet) that can refine predicted density maps progressively based on hierarchical multi-level density priors. In particular, CRNet is composed of several fully convolutional networks. They are stacked together recursively with the previous output as the next input, and each of them serves to utilize previous density output to gradually correct prediction errors of crowd areas and refine the predicted density maps at different stages. Cross-stage multi-level density priors are further exploited in our recurrent framework by the cross-stage skip layers based on ConvLSTM. To cope with different challenges of unconstrained crowd scenes, we explore different crowd-specific data augmentation methods to mimic real-world scenarios and enrich crowd feature representations from different aspects. Extensive experiments show the proposed method achieves superior performances against state-of-the-art methods on four widely-used challenging benchmarks in terms of counting accuracy and density map quality. Code and models are available at this https://github.com/lytgftyf/Crowd-Counting-via-Cross-stage-Refinement-Networks.
AB - Crowd counting is challenging due to unconstrained imaging factors, e.g., background clutters, non-uniform distribution of people, large scale and perspective variations. Dealing with these problems using deep neural networks requires rich prior knowledge and multi-scale contextual representations. In this paper, we propose a Cross-stage Refinement Network (CRNet) that can refine predicted density maps progressively based on hierarchical multi-level density priors. In particular, CRNet is composed of several fully convolutional networks. They are stacked together recursively with the previous output as the next input, and each of them serves to utilize previous density output to gradually correct prediction errors of crowd areas and refine the predicted density maps at different stages. Cross-stage multi-level density priors are further exploited in our recurrent framework by the cross-stage skip layers based on ConvLSTM. To cope with different challenges of unconstrained crowd scenes, we explore different crowd-specific data augmentation methods to mimic real-world scenarios and enrich crowd feature representations from different aspects. Extensive experiments show the proposed method achieves superior performances against state-of-the-art methods on four widely-used challenging benchmarks in terms of counting accuracy and density map quality. Code and models are available at this https://github.com/lytgftyf/Crowd-Counting-via-Cross-stage-Refinement-Networks.
KW - Crowd counting
KW - image refinement
KW - recurrent network
UR - http://www.scopus.com/inward/record.url?scp=85087927487&partnerID=8YFLogxK
U2 - 10.1109/TIP.2020.2994410
DO - 10.1109/TIP.2020.2994410
M3 - Journal article
AN - SCOPUS:85087927487
SN - 1057-7149
VL - 29
SP - 6800
EP - 6812
JO - IEEE Transactions on Image Processing
JF - IEEE Transactions on Image Processing
M1 - 9096602
ER -