TY - JOUR
T1 - Multiple Binocular Cameras-Based Indoor Localization Technique Using Deep Learning and Multimodal Fusion
AU - Yan, Jun
AU - Zhang, Yimei
AU - Kang, Bin
AU - Zhu, Wei Ping
AU - Lun, Daniel Pak Kong
N1 - This work was supported by the National Natural Science Foundation of China under Grant 61771256, Grant 61471205, Grant 61801242, Grant 61801245, Grant 61871235, and Grant 61801377.
Publisher Copyright:
© 2001-2012 IEEE.
PY - 2022/1/15
Y1 - 2022/1/15
N2 - In this paper, an image based indoor localization technique using multiple binocular cameras is proposed by the deep learning and multimodal fusion. First, by taking advantage of the cross-model correlations between various multimodal images for localization purpose, the obtained images are concatenated to form two new modalities: three-channel gray image and three-channel depth image. Then, a two-stream convolutional neural network (CNN) is used for multimodal feature extraction which can ensure the independent of each image modality. Moreover, a decision-level fusion rule is proposed to fuse the extracted features with the linear weight sum method. At last, in order to make use of the feature correlation between each image modality, the fused feature is extracted once again by two convolutional max-pooling blocks. The shrinkage Loss based loss function is designed to obtain the position based regression function at last. Field tests show that the proposed algorithm can obtain more accurate position estimation than other existing image based localization approaches.
AB - In this paper, an image based indoor localization technique using multiple binocular cameras is proposed by the deep learning and multimodal fusion. First, by taking advantage of the cross-model correlations between various multimodal images for localization purpose, the obtained images are concatenated to form two new modalities: three-channel gray image and three-channel depth image. Then, a two-stream convolutional neural network (CNN) is used for multimodal feature extraction which can ensure the independent of each image modality. Moreover, a decision-level fusion rule is proposed to fuse the extracted features with the linear weight sum method. At last, in order to make use of the feature correlation between each image modality, the fused feature is extracted once again by two convolutional max-pooling blocks. The shrinkage Loss based loss function is designed to obtain the position based regression function at last. Field tests show that the proposed algorithm can obtain more accurate position estimation than other existing image based localization approaches.
KW - convolutional neural network
KW - feature extraction
KW - Image-based indoor localization
KW - multimodal fusion
UR - https://www.scopus.com/pages/publications/85121352264
U2 - 10.1109/JSEN.2021.3133488
DO - 10.1109/JSEN.2021.3133488
M3 - Journal article
AN - SCOPUS:85121352264
SN - 1530-437X
VL - 22
SP - 1597
EP - 1608
JO - IEEE Sensors Journal
JF - IEEE Sensors Journal
IS - 2
ER -