TY - GEN
T1 - Identity-aware facial expression recognition in compressed video
AU - Liu, Xiaofeng
AU - Han, X
AU - You, Jia
AU - Kong, L
AU - Lu, Jun
N1 - Funding Information:
This work was supported by the Jangsu Youth Programme [SBK2020041180], National Natural Science Foundation of China, Younth Programme [grant number 61705221], NIH [NS061841, NS095986], Fanhan Technology, and Hong Kong Government General Research Fund GRF (Ref. No.152202/14E) are greatly appreciated.
Publisher Copyright:
© 2020 IEEE
PY - 2020
Y1 - 2020
N2 - This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3× faster inference with compressed data.
AB - This paper targets to explore the inter-subject variations eliminated facial expression representation in the compressed video domain. Most of the previous methods process the RGB images of a sequence, while the off-the-shelf and valuable expression-related muscle movement already embedded in the compression format. In the up to two orders of magnitude compressed domain, we can explicitly infer the expression from the residual frames and possible to extract identity factors from the I frame with a pre-trained face recognition network. By enforcing the marginal independent of them, the expression feature is expected to be purer for the expression and be robust to identity shifts. We do not need the identity label or multiple expression samples from the same person for identity elimination. Moreover, when the apex frame is annotated in the dataset, the complementary constraint can be further added to regularize the feature-level game. In testing, only the compressed residual frames are required to achieve expression prediction. Our solution can achieve comparable or better performance than the recent decoded image based methods on the typical FER benchmarks with about 3× faster inference with compressed data.
UR - http://www.scopus.com/inward/record.url?scp=85100973169&partnerID=8YFLogxK
U2 - 10.1109/ICPR48806.2021.9412820
DO - 10.1109/ICPR48806.2021.9412820
M3 - Conference article published in proceeding or book
T3 - Proceedings - International Conference on Pattern Recognition
SP - 7508
EP - 7514
BT - Proceedings of ICPR 2020 - 25th International Conference on Pattern Recognition
ER -