Multimodal-Semantic Context-Aware Graph Neural Network for Group Activity Recognition

Tianshan Liu, Rui Zhao, Kin Man Lam

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Group activities in videos involve visual interaction contexts in multiple modalities between actors, and co-occurrence between individual action labels. However, most of the current group activity recognition methods either model actor-actor relations based on the single RGB modality, or ignore exploiting the label relationships. To capture these rich visual and semantic contexts, we propose a multimodal-semantic context-aware graph neural network (MSCA-GNN). Specifically, we first build two visual sub-graphs based on the appearance cues and motion patterns extracted from RGB and optical-flow modalities, respectively. Then, two attention-based aggregators are proposed to refine each node, by gathering representations from other nodes and heterogeneous modalities. In addition, a semantic graph is constructed based on linguistic embeddings to model label relationships. We employ a bi-directional mapping learning strategy to further integrate the information from both multimodal visual and semantic graphs. Experimental results on two group activity benchmarks show the effectiveness of the proposed method.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Multimedia and Expo, ICME 2021
PublisherIEEE Computer Society
Pages1-6
Number of pages6
ISBN (Electronic)9781665438643
DOIs
Publication statusPublished - Jul 2021
Event2021 IEEE International Conference on Multimedia and Expo, ICME 2021 - Shenzhen, China
Duration: 5 Jul 20219 Jul 2021

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2021 IEEE International Conference on Multimedia and Expo, ICME 2021
Country/TerritoryChina
CityShenzhen
Period5/07/219/07/21

Keywords

  • graph neural network
  • Group activity recognition
  • multimodal-semantic context

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this