Text-guided Visual Prompt Tuning with Masked Images for Facial Expression Recognition

Rongkang Dong, Cuixin Yang, Kin Man Lam

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

Abstract

Facial expression recognition (FER) has significantly advanced through the application of deep learning techniques for visual content classification. Recent research has explored the use of pre-trained language-image models, such as CLIP, which leverage natural language supervision to enhance image backbone training and facilitate the learning of general visual representations. Concurrently, visual prompt tuning has emerged as a method to minimize tuning overhead for downstream tasks by freezing the pre-trained backbone models and incorporating additional learnable parameters, known as visual prompts, into the model input. This strategy circumvents the need to update the entire neural network, focusing instead on optimizing visual prompts for specific tasks. In this study, we propose a novel tuning scheme, namely Text-guided Visual Prompt Tuning with Masked facial images (T-VPT-M), for both basic and compound FER. Our method utilizes natural language supervision for visual prompt learning and employs a random masking mechanism to adapt visual prompts to diverse informative facial regions. Experimental results on three real-world datasets, encompassing both basic and compound facial expressions, demonstrate the efficacy of the T-VPT-M scheme.

Original languageEnglish
Title of host publicationAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350367331
DOIs
Publication statusPublished - Dec 2024
Event2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
Duration: 3 Dec 20246 Dec 2024

Publication series

NameAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/TerritoryChina
CityMacau
Period3/12/246/12/24

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Text-guided Visual Prompt Tuning with Masked Images for Facial Expression Recognition'. Together they form a unique fingerprint.

Cite this