Hybrid Graph Reasoning with Dynamic Interaction for Visual Dialog

Shanshan Du, Hanli Wang, Tengpeng Li, Chang Wen Chen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

As a pivotal branch of intelligent human-computer interaction, visual dialog is a technically challenging task that requires artificial intelligence (AI) agents to answer consecutive questions based on image content and history dialog. Despite considerable progresses, visual dialog still suffers from two major problems: (1) how to design flexible cross-modal interaction patterns instead of over-reliance on expert experience and (2) how to infer underlying semantic dependencies between dialogues effectively. To address these issues, an end-to-end framework employing dynamic interaction and hybrid graph reasoning is proposed in this work. Specifically, three major components are designed and the practical benefits are demonstrated by extensive experiments. First, a dynamic interaction module is developed to automatically determine the optimal modality interaction route for multifarious questions, which consists of three elaborate functional interaction blocks endowed with dynamic routers. Second, a hybrid graph reasoning module is designed to explore adequate semantic associations between dialogues from multiple perspectives, where the hybrid graph is constructed by aggregating a structured coreference graph and a context-aware temporal graph. Third, a unified one-stage visual dialog model with an end-to-end structure is developed to train the dynamic interaction module and the hybrid graph reasoning module in a collaborative manner. Extensive experiments on the benchmark datasets of VisDial v0.9 and VisDial v1.0 demonstrate the effectiveness of the proposed method compared to other state-of-the-art approaches.

Original languageEnglish
Pages (from-to)9095-9108
Number of pages14
JournalIEEE Transactions on Multimedia
Volume26
DOIs
Publication statusPublished - 2024

Keywords

  • Cross-modal interaction
  • dynamic routing
  • graph neural network
  • graph reasoning
  • visual dialog

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Hybrid Graph Reasoning with Dynamic Interaction for Visual Dialog'. Together they form a unique fingerprint.

Cite this