Crowd Counting Via Cross-Stage Refinement Networks

Yongtuo Liu, Qiang Wen, Haoxin Chen, Wenxi Liu, Jing Qin, Guoqiang Han, Shengfeng He

Research output: Journal article publicationJournal articleAcademic researchpeer-review

39 Citations (Scopus)


Crowd counting is challenging due to unconstrained imaging factors, e.g., background clutters, non-uniform distribution of people, large scale and perspective variations. Dealing with these problems using deep neural networks requires rich prior knowledge and multi-scale contextual representations. In this paper, we propose a Cross-stage Refinement Network (CRNet) that can refine predicted density maps progressively based on hierarchical multi-level density priors. In particular, CRNet is composed of several fully convolutional networks. They are stacked together recursively with the previous output as the next input, and each of them serves to utilize previous density output to gradually correct prediction errors of crowd areas and refine the predicted density maps at different stages. Cross-stage multi-level density priors are further exploited in our recurrent framework by the cross-stage skip layers based on ConvLSTM. To cope with different challenges of unconstrained crowd scenes, we explore different crowd-specific data augmentation methods to mimic real-world scenarios and enrich crowd feature representations from different aspects. Extensive experiments show the proposed method achieves superior performances against state-of-the-art methods on four widely-used challenging benchmarks in terms of counting accuracy and density map quality. Code and models are available at this

Original languageEnglish
Article number9096602
Pages (from-to)6800-6812
Number of pages13
JournalIEEE Transactions on Image Processing
Publication statusPublished - 2020


  • Crowd counting
  • image refinement
  • recurrent network

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Crowd Counting Via Cross-Stage Refinement Networks'. Together they form a unique fingerprint.

Cite this