Abstract
Integrating multispectral data has been demonstrated to be an effective solution for illumination-invariant pedestrian detection, in particular, RGB and thermal images can provide complementary information to handle light variations. However, most of the current multispectral detectors fuse the multimodal features by simple concatenation, without discovering their latent relationships. In this paper, we propose a cross-modal feature learning (CFL) module, based on a split-and-aggregation strategy, to explicitly explore both the shared and modalityspecific representations between paired RGB and thermal images. We insert the proposed CFL module into multiple layers of a twobranch-based pedestrian detection network, to learn the crossmodal representations in diverse semantic levels. By introducing a segmentation-based auxiliary task, the multimodal network is trained end-to-end by jointly optimizing a multi-task loss. On the other hand, to alleviate the reliance of existing multispectral pedestrian detectors on thermal images, we propose a knowledge distillation framework to train a student detector, which only receives RGB images as input and distills the cross-modal representations guided by a well-trained multimodal teacher detector. In order to facilitate the cross-modal knowledge distillation, we design different distillation loss functions for the feature, detection and segmentation levels. Experimental results on the public KAIST multispectral pedestrian benchmark validate that the proposed cross-modal representation learning and distillation method achieves robust performance.
Original language | English |
---|---|
Article number | Article number 9357413 |
Pages (from-to) | 315 - 329 |
Number of pages | 15 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 32 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2022 |
Keywords
- cross-modal representation
- Detectors
- Feature extraction
- Illumination-invariant pedestrian detection
- Image segmentation
- knowledge distillation
- Lighting
- multispectral fusion
- Semantics
- Task analysis
- Training
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering