Skip to main navigation Skip to search Skip to main content

Boosting Scene Graph Generation with Visual Relation Saliency

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

The scene graph is a symbolic data structure that comprehensively describes the objects and visual relations in a visual scene, while ignoring the inherent perceptual saliency of each visual relation (i.e., relation saliency). However, humans often quickly allocate attention to important/salient visual relations in a scene. To align with such human perception of a scene, we explicitly model the perceptual saliency of visual relation in scene graph by upgrading each graph edge (i.e., visual relation) with an attribute of relation saliency. We present a new design, named as Saliency-guided Message Passing (SMP), that boosts the generation of such scene graph structure with the guidance from the visual relation saliency. Technically, an object interaction encoder is first utilized to strengthen object relation representations by jointly exploiting the appearance, semantic, and spatial relations in between. A branch is further leveraged to estimate the relation saliency of each visual relation by ordinal regression. Next, conditioned on the object and relation features (coupled with the estimated relation saliency), our SMP enhances scene graph generation by performing message passing over the objects and the most salient relations. Extensive experiments on VG-KR and VG150 datasets demonstrate the superiority of SMP for the scene graph generation. Moreover, we empirically validate the compelling generalizability of the learned scene graphs via SMP on downstream tasks like cross-model retrieval and image captioning.

Original languageEnglish
Article number8
Pages (from-to)1-17
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume19
Issue number1
DOIs
Publication statusPublished - 5 Jan 2023

Keywords

  • relation saliency
  • Scene graph generation

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Boosting Scene Graph Generation with Visual Relation Saliency'. Together they form a unique fingerprint.

Cite this