Contrastive learning-based place descriptor representation for cross-modality place recognition

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

Place recognition in LiDAR maps plays a vital role in assisting localization, especially in GPS-denied circumstances. While many efforts have been made toward pure LiDAR-based place recognition, these approaches are often hindered by high computational costs and operational burden on the driving agent. To alleviate these limitations, we explore an alternative approach for large-scale cross-modal localization by matching real-time RGB images to pre-existing LiDAR 3D point cloud maps. Specifically, we present a unified place descriptor representation learning method for cross modalities using Siamese architecture, which reformulates place recognition as a similarity modeling retrieval task. To address the inherent modality differences between visual images and point clouds, we first transform unordered point clouds into a range-view representation, facilitating effective cross-modal metric learning. Subsequently, we introduce a Transformer-Mamba Mixer module that integrates selective scanning and attention mechanisms to capture both intra-context and inter-context embeddings, enabling the generation of place descriptors. To further enrich and generate global location descriptors, we propose a semantic-promoted descriptor enhancer grounded in semantic distribution estimation. Finally, a contrastive learning paradigm is employed to perform cross-modal place recognition, identifying the most similar descriptors across modalities. Extensive experiments demonstrate the superiority of our proposed method in comparison to state-of-the-art methods. The details are available at https://github.com/emilyemliyM/Cross-PRNet.

Original languageEnglish
Article number103351
JournalInformation Fusion
Volume124
DOIs
Publication statusPublished - Dec 2025

Keywords

  • Contrastive learning
  • Cross-modality
  • Descriptor representation
  • Place recognition

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Information Systems
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Contrastive learning-based place descriptor representation for cross-modality place recognition'. Together they form a unique fingerprint.

Cite this