Exploring Duality in Visual Question-Driven Top-Down Saliency

Shengfeng He, Chu Han, Guoqiang Han, Jing Qin

Research output: Journal article publicationJournal articleAcademic researchpeer-review

13 Citations (Scopus)


Top-down, goal-driven visual saliency exerts a huge influence on the human visual system for performing visual tasks. Text generations, like visual question answering (VQA) and visual question generation (VQG), have intrinsic connections with top-down saliency, which is usually involved in both VQA and VQG processes in an unsupervised manner. However, it is shown that the regions that humans choose to look at to answer questions are very different from the unsupervised attention models. In this brief, we aim to explore the intrinsic relationship between top-down saliency and text generations, and to figure out whether an accurate saliency response benefits text generation. To this end, we propose a dual supervised network with dynamic parameter prediction. Dual-supervision explicitly exploits the probabilistic correlation between the primal task top-down saliency detection and the dual task text generation, while dynamic parameter prediction encodes the given text (i.e., question or answer) into the fully convolutional network. Extensive experiments show the proposed top-down saliency method achieves the best correlation with human attention among various baselines. In addition, the proposed model can be guided by either questions or answers, and output the counterpart. Furthermore, we show that combining human-like visual question-saliency improves the performance of both answer and question generations.

Original languageEnglish
Article number8822633
Pages (from-to)2672-2679
Number of pages8
JournalIEEE Transactions on Neural Networks and Learning Systems
Issue number7
Publication statusPublished - Jul 2020


  • Dual learning
  • saliency
  • visual question answering (VQA)
  • visual question generation (VQG)

ASJC Scopus subject areas

  • Software
  • Computer Science Applications
  • Computer Networks and Communications
  • Artificial Intelligence


Dive into the research topics of 'Exploring Duality in Visual Question-Driven Top-Down Saliency'. Together they form a unique fingerprint.

Cite this