Abstract
Video based facial expression recognition has been a long standing problem and attracted growing attention recently. The key to a successful facial expression recognition system is to exploit the potentials of audiovisual modalities and design robust features to effectively characterize the facial appearance and configuration changes caused by facial motions. We propose an effective framework to address this issue in this paper. In our study, both visual modalities (face images) and audio modalities (speech) are utilized. A new feature descriptor called Histogram of Oriented Gradients from Three Orthogonal Planes (HOG-TOP) is proposed to extract dynamic textures from video sequences to characterize facial appearance changes. And a new effective geometric feature derived from the warp transformation of facial landmarks is proposed to capture facial configuration changes. Moreover, the role of audio modalities on recognition is also explored in our study. We applied the multiple feature fusion to tackle the video-based facial expression recognition problems under lab-controlled environment and in the wild, respectively. Experiments conducted on the extended Cohn-Kanade (CK+) database and the Acted Facial Expression in Wild (AFEW) 4.0 database show that our approach is robust in dealing with video-based facial expression recognition problems under lab-controlled environment and in the wild compared with the other state-of-the-art methods.
Original language | English |
---|---|
Pages (from-to) | 38-50 |
Number of pages | 13 |
Journal | IEEE Transactions on Affective Computing |
Volume | 9 |
Issue number | 1 |
DOIs | |
Publication status | Published - 1 Jan 2018 |
Keywords
- acoustic feature
- Facial expression recognition
- geometric warp feature
- HOG-TOP
- multiple feature fusion
ASJC Scopus subject areas
- Software
- Human-Computer Interaction