CFRLA-Net: A Context-Aware Feature Representation Learning Anchor-Free Network for Pedestrian Detection

Jun Li, Yuquan Bi, Sumei Wang, Qiming Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

High resolution and strong semantic representation are both vital for feature extraction networks of pedestrian detection. The existing high-resolution network (HRNet) has presented a promising performance for pedestrian detection. However, we observed that it still has some significant shortcomings for heavily occluded and small-scale pedestrians. In this paper, we propose to address the shortcomings by extracting semantic and spatial context from HRNet. Specifically, we propose a Context-aware Feature Representation Learning Module (CFRL-Module), which combines a Multi-scale Feature Context Extraction Parallel Block for Convolution and Self-attention (CEPCA-Block) with two parallel paths and an Equivalent FFN (EFFN) Block. The core CEPCA-Block adopts a parallel design to integrate convolution and multi-head self-attention (MHSA) with low parameter computational cost, which can obtain the deep semantic context by convolution path and precise context by MHSA path. Furthermore, to overcome the inefficiency of global MHSA in high-resolution pedestrian detection, we propose a novel local window MHSA, which can significantly reduce memory consumption but barely affect the detection performance. Cascading the proposed CFRL-Module with the anchor-free detection head constitutes our Context-aware Feature Representation Learning Anchor-Free Network (CFRLA-Net). The proposed CFRLA-Net can catch a high-level understanding of the heavily occluded and small-scale pedestrian instances based on HRNet, which can effectively solve the limitation of the insufficient feature extraction ability of HRNet for the hard samples. Experimental results show that CFRLA-Net achieves state-of-the-art performance on CityPersons, Caltech, and CrowdHuman benchmarks.

Original languageEnglish
Pages (from-to)4948-4961
Number of pages14
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number9
DOIs
Publication statusPublished - 1 Sept 2023

Keywords

  • anchor-free
  • context
  • HRNet
  • occluded and small-scale pedestrians
  • Pedestrian detection
  • self-attention

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'CFRLA-Net: A Context-Aware Feature Representation Learning Anchor-Free Network for Pedestrian Detection'. Together they form a unique fingerprint.

Cite this