Connecting language and vision for natural language-based vehicle retrieval

Shuai Bai, Zhedong Zheng, Xiaohan Wang, Junyang Lin, Zhu Zhang, Chang Zhou, Hongxia Yang, Yi Yang

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

31 Citations (Scopus)

Abstract

Vehicle search is one basic task for the efficient traffic management in terms of the AI City. Most existing prac-tices focus on the image-based vehicle matching, including vehicle re-identification and vehicle tracking. In this paper, we apply one new modality, i.e., the language description, to search the vehicle of interest and explore the potential of this task in the real-world scenario. The natural language-based vehicle search poses one new challenge of fine-grained understanding of both vision and language modalities. To connect language and vision, we propose to jointly train the state-of-the-art vision models with the transformer-based language model in an end-to-end manner. Except for the network structure design and the training strategy, several optimization objectives are also revisited in this work. The qualitative and quantitative experiments verify the effectiveness of the proposed method. Our proposed method has achieved the 1st place on the 5th AI City Challenge, yielding competitive performance 18.69% MRR accuracy on the private test set. We hope this work can pave the way for the future study on using language description effectively and efficiently for real-world vehicle retrieval systems. The code will be available at https://github.com/ShuaiBai623/AIC2021-T5-CLV.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
PublisherIEEE Computer Society
Pages4029-4038
Number of pages10
ISBN (Electronic)9781665448994
DOIs
Publication statusPublished - Jun 2021
Externally publishedYes
Event2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021 - Virtual, Online, United States
Duration: 19 Jun 202125 Jun 2021

Publication series

NameIEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
ISSN (Print)2160-7508
ISSN (Electronic)2160-7516

Conference

Conference2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2021
Country/TerritoryUnited States
CityVirtual, Online
Period19/06/2125/06/21

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Connecting language and vision for natural language-based vehicle retrieval'. Together they form a unique fingerprint.

Cite this