Multiple Binocular Cameras-Based Indoor Localization Technique Using Deep Learning and Multimodal Fusion

Jun Yan, Yimei Zhang, Bin Kang, Wei Ping Zhu, Daniel Pak Kong Lun

Research output: Journal article publicationJournal articleAcademic researchpeer-review


In this paper, an image based indoor localization technique using multiple binocular cameras is proposed by the deep learning and multimodal fusion. First, by taking advantage of the cross-model correlations between various multimodal images for localization purpose, the obtained images are concatenated to form two new modalities: three-channel gray image and three-channel depth image. Then, a two-stream convolutional neural network (CNN) is used for multimodal feature extraction which can ensure the independent of each image modality. Moreover, a decision-level fusion rule is proposed to fuse the extracted features with the linear weight sum method. At last, in order to make use of the feature correlation between each image modality, the fused feature is extracted once again by two convolutional max-pooling blocks. The shrinkage Loss based loss function is designed to obtain the position based regression function at last. Field tests show that the proposed algorithm can obtain more accurate position estimation than other existing image based localization approaches.

Original languageEnglish
Pages (from-to)1597-1608
Number of pages12
JournalIEEE Sensors Journal
Issue number2
Publication statusPublished - 15 Jan 2022


  • convolutional neural network
  • feature extraction
  • Image-based indoor localization
  • multimodal fusion

ASJC Scopus subject areas

  • Instrumentation
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multiple Binocular Cameras-Based Indoor Localization Technique Using Deep Learning and Multimodal Fusion'. Together they form a unique fingerprint.

Cite this