Abstract
The emergence of wearable devices has opened up new potentials for egocentric activity recognition. Although some methods integrate attention mechanisms into deep neural networks to capture fine-grained human-object interactions in a weak-supervision manner, they either ignore exploiting the temporal consistency or generate attention based on considering appearance cues only. To address these limitations, in this paper, we propose an enhanced attention-tracking method, combined with multi-branch network (EAT-MBNet), for egocentric activity recognition. Specifically, we propose class-aware attention maps (CAAMs) by employing a self-attention-based module to refine the class activation maps (CAMs). Our proposed method can enhance the semantic dependency between the activity categories and the feature maps. To highlight the discriminative features from the regions of interest across frames, we propose a flow-guided attention-tracking (F-AT) module, by simultaneously leveraging historical attention and motion patterns. Furthermore, we propose a cross-modality modeling branch based on an interactive GRU module, which captures the time-synchronized long-term relationships between the appearance and motion branches. Experimental results on four egocentric activity benchmarks demonstrate that the proposed method achieves state-of-the-art performance.
Original language | English |
---|---|
Article number | 9513243 |
Pages (from-to) | 3587-3602 |
Number of pages | 16 |
Journal | IEEE Transactions on Circuits and Systems for Video Technology |
Volume | 32 |
Issue number | 6 |
DOIs | |
Publication status | Published - Jun 2022 |
Keywords
- attention tracking
- Egocentric activity recognition
- fine-grained hand-object interactions
- multi-branch network
ASJC Scopus subject areas
- Media Technology
- Electrical and Electronic Engineering