In this paper, an image based indoor localization technique using multiple binocular cameras is proposed by the deep learning and multimodal fusion. First, by taking advantage of the cross-model correlations between various multimodal images for localization purpose, the obtained images are concatenated to form two new modalities: three-channel gray image and three-channel depth image. Then, a two-stream convolutional neural network (CNN) is used for multimodal feature extraction which can ensure the independent of each image modality. Moreover, a decision-level fusion rule is proposed to fuse the extracted features with the linear weight sum method. At last, in order to make use of the feature correlation between each image modality, the fused feature is extracted once again by two convolutional max-pooling blocks. The shrinkage Loss based loss function is designed to obtain the position based regression function at last. Field tests show that the proposed algorithm can obtain more accurate position estimation than other existing image based localization approaches.
- convolutional neural network
- feature extraction
- Image-based indoor localization
- multimodal fusion
ASJC Scopus subject areas
- Electrical and Electronic Engineering