TY - GEN
T1 - A Retriever-Reader Framework with Visual Entity Linking for Knowledge-Based Visual Question Answering
AU - You, Jiuxiang
AU - Yang, Zhenguo
AU - Li, Qing
AU - Liu, Wenyin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023/8
Y1 - 2023/8
N2 - In this paper, we propose a Retriever-Reader framework with Visual Entity Linking (RR-VEL) for knowledge-based visual question answering. Given images and original questions, the visual entity linking (VEL) module extracts key entities in images to replace the question referents for semantic disambiguation, achieving entity-oriented queries with explicit entities. Furthermore, the Retriever encodes the queries and knowledge items by Bert with a feed-forward layer, and obtains a set of knowledge candidates. The Reader encodes the questions with image captions and knowledge candidates in two branches, which avoids their interference during self-attentive encoding. Finally, the decoder of Reader fuses the encoded features to generate answers. Extensive experiments conducted on the two public datasets show that our method significantly outperforms the existing baselines.
AB - In this paper, we propose a Retriever-Reader framework with Visual Entity Linking (RR-VEL) for knowledge-based visual question answering. Given images and original questions, the visual entity linking (VEL) module extracts key entities in images to replace the question referents for semantic disambiguation, achieving entity-oriented queries with explicit entities. Furthermore, the Retriever encodes the queries and knowledge items by Bert with a feed-forward layer, and obtains a set of knowledge candidates. The Reader encodes the questions with image captions and knowledge candidates in two branches, which avoids their interference during self-attentive encoding. Finally, the decoder of Reader fuses the encoded features to generate answers. Extensive experiments conducted on the two public datasets show that our method significantly outperforms the existing baselines.
KW - Entity linking
KW - Knowledge graph
KW - VQA
UR - http://www.scopus.com/inward/record.url?scp=85171169505&partnerID=8YFLogxK
U2 - 10.1109/ICME55011.2023.00011
DO - 10.1109/ICME55011.2023.00011
M3 - Conference article published in proceeding or book
AN - SCOPUS:85171169505
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
SP - 13
EP - 18
BT - Proceedings - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
PB - IEEE Computer Society
T2 - 2023 IEEE International Conference on Multimedia and Expo, ICME 2023
Y2 - 10 July 2023 through 14 July 2023
ER -