TY - JOUR
T1 - Hybrid refinement-correction heatmaps for human pose estimation
AU - Kamel, Aouaidjia
AU - Sheng, Bin
AU - Li, Ping
AU - Kim, Jinman
AU - Feng, David Dagan
N1 - Funding Information:
This work was supported in part by the National Key Research and Development Program of China (2018YFF0300903), and in part by the Science and Technology Commission of ShanghaiMunicipality under Grants 18410750700, 17411952600, and 16DZ0501100, in part by theNationalNatural Science Foundation of China under Grants 61872241 and 61572316, and in part by The Hong Kong Polytechnic University under Grants P0030419 and P0030929.
Funding Information:
Manuscript received July 12, 2018; revised May 10, 2020; accepted May 14, 2020. Date of publication June 3, 2020; date of current version April 23, 2021. This work was supported in part by the National Key Research and Development Program of China (2018YFF0300903), and in part by the Science and Technology Commission of Shanghai Municipality under Grants 18410750700, 17411952600, and 16DZ0501100, in part by the National Natural Science Foundation of China under Grants 61872241 and 61572316, and in part by The Hong Kong Polytechnic University under Grants P0030419 and P0030929. The associate editor coordinating the review of this manuscript and approving it for publication was Jingdong Wang. (Corresponding author: Bin Sheng.) Aouaidjia Kamel is with the Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China, and also with the Space Techniques Center, Algerian Space Agency, Arzew 31200, Algeria (e-mail: kameldz40@gmail.com).
Publisher Copyright:
© 1999-2012 IEEE.
PY - 2021/4
Y1 - 2021/4
N2 - In this paper, we present a method (Hybrid-Pose) to improve human pose estimation in images. We adopt Stacked Hourglass Networks to design two convolutional neural network models, RNet for pose refinement and CNet for pose correction. The CNet (Correction Network) guides the pose refinement RNet (Refinement Network) to correct the joint location before generating the final pose. Each of the two models is composed of four hourglasses, and each hourglass generates a group of detection heatmaps for the joints. The RNet model hourglasses have the same structure. However, the CNet model is designed with hourglasses of different structures for pose guidance. Since the pose estimation in RGB images is very sensitive to the image scene, our proposed approach generates multiple outputs of detection heatmaps to broaden the searching scope for the correct joints locations. We use the RNet model to refine the joints locations in each hourglass stage horizontally, then the heatmaps of each stage are fused with the heatmaps of all the CNet model hourglasses vertically in a hybrid manner. Our method shows competitive results with the existing state-of-the-art approaches on MPII and FLIC benchmark datasets. Although our proposed method focuses on improving single-person pose estimation, we also show the influence of this improvement on multi-person pose estimation by detecting multiple people using SSD detector, then estimating the pose of each person individually.
AB - In this paper, we present a method (Hybrid-Pose) to improve human pose estimation in images. We adopt Stacked Hourglass Networks to design two convolutional neural network models, RNet for pose refinement and CNet for pose correction. The CNet (Correction Network) guides the pose refinement RNet (Refinement Network) to correct the joint location before generating the final pose. Each of the two models is composed of four hourglasses, and each hourglass generates a group of detection heatmaps for the joints. The RNet model hourglasses have the same structure. However, the CNet model is designed with hourglasses of different structures for pose guidance. Since the pose estimation in RGB images is very sensitive to the image scene, our proposed approach generates multiple outputs of detection heatmaps to broaden the searching scope for the correct joints locations. We use the RNet model to refine the joints locations in each hourglass stage horizontally, then the heatmaps of each stage are fused with the heatmaps of all the CNet model hourglasses vertically in a hybrid manner. Our method shows competitive results with the existing state-of-the-art approaches on MPII and FLIC benchmark datasets. Although our proposed method focuses on improving single-person pose estimation, we also show the influence of this improvement on multi-person pose estimation by detecting multiple people using SSD detector, then estimating the pose of each person individually.
KW - heatmaps fusion
KW - Human pose estimation
KW - pose correction
KW - pose refinement
UR - http://www.scopus.com/inward/record.url?scp=85105018901&partnerID=8YFLogxK
U2 - 10.1109/TMM.2020.2999181
DO - 10.1109/TMM.2020.2999181
M3 - Journal article
AN - SCOPUS:85105018901
SN - 1520-9210
VL - 23
SP - 1330
EP - 1342
JO - IEEE Transactions on Multimedia
JF - IEEE Transactions on Multimedia
ER -