Abstract
Video-based egocentric activity recognition involves fine-grained spatiooral human-object interactions. State-of-the-art methods, based on the two-branch-based architecture, rely on pre-calculated optical flows to provide motion information. However, this two-stage strategy is computationally intensive, storage demanding, and not task-oriented, which hampers it from being deployed in real-world applications. Albeit there have been numerous attempts to explore other motion representations to replace optical flows, most of the methods were designed for third-person activities, without capturing fine-grained cues. To tackle these issues, in this letter, we propose a progressive motion representation distillation (PMRD) method, based on two-branch networks, for egocentric activity recognition. We exploit a generalized knowledge distillation framework to train a hallucination network, which receives RGB frames as input and produces motion cues guided by the optical-flow network. Specifically, we propose a progressive metric loss, which aims to distill local fine-grained motion patterns in terms of each temporal progress level. To further enforce the proposed distillation framework to concentrate on those informative frames, we integrate a temporal attention mechanism into the metric loss. Moreover, a multi-stage training procedure is employed for the efficient learning of the hallucination network. Experimental results on three egocentric activity benchmarks demonstrate the state-of-the-art performance of the proposed method.
Original language | English |
---|---|
Article number | 9146209 |
Pages (from-to) | 1320-1324 |
Number of pages | 5 |
Journal | IEEE Signal Processing Letters |
Volume | 27 |
DOIs | |
Publication status | Published - Jul 2020 |
Keywords
- Egocentric activity recognition
- knowledge distillation
- motion representation
- two-branch networks
ASJC Scopus subject areas
- Signal Processing
- Electrical and Electronic Engineering
- Applied Mathematics