Abstract
Camera-based indoor localization is a fundamental aspect of indoor navigation, virtual reality, and location-based services. Deep learning methods have exhibited remarkable performance with low storage requirements and high efficiency. However, existing methods mainly derive features implicitly for pose regression without considering explicit structure information from images. This paper proposes that incorporating such information can improve the localization performance of learning-based approaches. We extract structure information from RGB images in the form of depth maps and edge maps and design two modules for depth-weighted and edge-weighted feature fusion. These modules are integrated into the pose regression network to enhance pose prediction. Furthermore, we employ a self-attention module for high-level feature extraction to augment the network capacity. Extensive experiments are conducted on the publicly available 7Scenes and 12Scenes datasets, and the results demonstrate that the proposed method achieves high localization performance, with an average positional error of 0.19m and 0.16m, respectively. The code for this work is available at https://github.com/lqing900205/structureLoc.
Original language | English |
---|---|
Pages (from-to) | 219-229 |
Number of pages | 11 |
Journal | ISPRS Journal of Photogrammetry and Remote Sensing |
Volume | 202 |
DOIs | |
Publication status | Published - Aug 2023 |
Keywords
- Camera localization
- Depth estimation
- Edge detection
- Structure information
ASJC Scopus subject areas
- Atomic and Molecular Physics, and Optics
- Engineering (miscellaneous)
- Computer Science Applications
- Computers in Earth Sciences