Image2Audio: Facilitating Semi-supervised Audio Emotion Recognition with Facial Expression Image

Gewen He, Xiaofeng Liu, Fangfang Fan, Jia You

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

41 Citations (Scopus)

Abstract

There is a large amount of public available labeled image-based facial expression recognition datasets. How could these images help for the audio emotion recognition with limited labeled data according to their inherent correlations can be a meaningful and challenging task. In this paper, we propose a semi-supervised adversarial network that allows the knowledge transfer from the labeled videos to the heterogeneous labeled audio domain hence enhancing the audio emotion recognition performance. Specifically, face image samples are translated to the spectrograms class-wisely. To harness the translated samples in a sparsely distributed area and construct a tighter decision boundary, we propose to precisely estimate the density on feature space and incorporate the reliable low-density sample with an annealing scheme. Moreover, the unlabeled audios are collected with the high-density path in a graph representation. As a possible "recognition via generation" framework, we empirically demonstrated its effectiveness on several audio emotional recognition benchmarks.
Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2020
PublisherIEEE Computer Society
Pages3978-3983
Number of pages6
ISBN (Electronic)9781728193601
DOIs
Publication statusPublished - Jun 2020
Event2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) - Seattle, WA, United States
Duration: 14 Jun 202019 Jun 2020

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Volume2020-June
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Competition

Competition2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Country/TerritoryUnited States
CityWA
Period14/06/2019/06/20

Fingerprint

Dive into the research topics of 'Image2Audio: Facilitating Semi-supervised Audio Emotion Recognition with Facial Expression Image'. Together they form a unique fingerprint.

Cite this