A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering

Jiuxiang You, Zhenguo Yang, Qing Li, Wenyin Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

1 Citation (Scopus)

Abstract

In this paper, we propose a Retriever-Reader framework with Visual Entity Linking (RR-VEL) for knowledge-based visual question answering. Given images and original questions, the visual entity linking (VEL) module extracts key entities in images to replace the question referents for semantic disambiguation, achieving entity-oriented queries with explicit entities. Furthermore, the Retriever encodes the queries and knowledge items by Bert with a feed-forward layer, and obtains a set of knowledge candidates. The Reader encodes the questions with image captions and knowledge candidates in two branches, which avoids their interference during self-attentive encoding. Finally, the decoder of Reader fuses the encoded features to generate answers. Extensive experiments conducted on the two public datasets show that our method significantly outperforms the existing baselines.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PublisherIEEE Computer Society
Pages13-18
Number of pages6
ISBN (Electronic)9781665468916
DOIs
Publication statusPublished - Aug 2023
Event2023 IEEE International Conference on Multimedia and Expo, ICME 2023 - Brisbane, Australia
Duration: 10 Jul 202314 Jul 2023

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
Volume2023-July
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Country/TerritoryAustralia
CityBrisbane
Period10/07/2314/07/23

Keywords

  • Entity linking
  • Knowledge graph
  • VQA

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Fingerprint

Dive into the research topics of 'A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering'. Together they form a unique fingerprint.

Cite this