Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression Recognition

Rui Zhao, Tianshan Liu, Zixun Huang, Daniel P.K. Lun, Kin Man Lam

Research output: Journal article publicationJournal articleAcademic researchpeer-review

4 Citations (Scopus)


Facial expression recognition (FER) is of great interest to the current studies of human-computer interaction. In this paper, we propose a novel geometry-guided facial expression recognition framework, based on graph convolutional networks and transformers, to perform effective emotion recognition from videos. Specifically, we detect and utilize facial landmarks to construct a spatial-temporal graph, based on both the landmark coordinates and local appearance, for representing a facial expression sequence. The graph convolutional blocks and transformer modules are employed to produce high-semantic emotion-related representations from the structured facial graphs, which facilitate the framework to establish both the local and non-local dependency between the vertices. Moreover, spatial and temporal attention mechanisms are introduced into graph-based learning to promote FER reasoning, via the emphasis on the most informative facial components and frames. Extensive experiments demonstrate that the proposed framework achieves promising performance for geometry-based FER and shows great generalization and robustness in real-world applications.

Original languageEnglish
Article number9794419
Pages (from-to)1-17
Number of pages17
JournalIEEE Transactions on Affective Computing
Publication statusPublished - Jun 2022


  • attention mechanism
  • Facial expression recognition
  • spatial-temporal graph convolutional network
  • spatial-temporal transformer

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction


Dive into the research topics of 'Spatial-Temporal Graphs Plus Transformers for Geometry-Guided Facial Expression Recognition'. Together they form a unique fingerprint.

Cite this