Knowledge-Based Visual Question Generation

Jiayuan Xie, Wenhao Fang, Yi Cai, Qingbao Huang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

22 Citations (Scopus)

Abstract

Visual question generation task aims to generate meaningful questions about an image targeting an answer. Existing methods focus on the visual concepts in the image for question generation. However, humans inevitably use their knowledge related to visual objects in images to construct questions. In this paper, we propose a knowledge-based visual question generation model that can integrate visual concepts and non-visual knowledge to generate questions. To obtain visual concepts, we utilize a pre-trained object detection model to obtain object-level features of each object in the image. To obtain useful non-visual knowledge, we first retrieve the knowledge from the knowledge-base related to the visual objects in the image. Considering that not all retrieved knowledge is helpful for this task, we introduce an answer-aware module to capture the candidate knowledge related to the answer from the retrieved knowledge, which ensures that the generated content can be targeted at the answer. Finally, object-level representations containing visual concepts and non-visual knowledge are sent to a decoder module to generate questions. Extensive experiments on the FVQA and KBVQA datasets show that the proposed model outperforms the state-of-the-art models.

Original languageEnglish
Pages (from-to)7547-7558
Number of pages12
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume32
Issue number11
DOIs
Publication statusPublished - 1 Nov 2022

Keywords

  • knowledge-based
  • multimodal
  • Visual question generation

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Knowledge-Based Visual Question Generation'. Together they form a unique fingerprint.

Cite this