DRAKE: Deep Pair-Wise Relation Alignment for Knowledge-Enhanced Multimodal Scene Graph Generation in Social Media Posts

Ze Fu, Changmeng Zheng, Junhao Feng, Yi Cai, Xiao Yong Wei, Yaowei Wang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)

Abstract

Scene Graph Generation (SGG) is a typical computer vision task that detects objects and corresponding predicates in an image. Existing SGG methods focus on modeling visual contexts to generate scene graphs and are conducted on well-annotated datasets with high-quality images. However, the quality is unguaranteed for images in social media posts, so that some images may be incomplete or occluded by some obstacles, hence might not provide sufficient visual context for SGG. Therefore, previous methods might result in missing or false visual relationship detection due to lacking visual contexts. To effectively generate the scene graphs in social media, we study multimodal scene graph generation (MSG) in this paper. MSG aims to develop visual scene graphs from images in social media posts with the support of text sentences. However, leveraging textual contents by simple multimodal alignment such as object-level alignment neglects the inherent pair-wise mapping between multimodal object pairs. To address the limitations, we propose a method named Deep pair-wise Relation Alignment for Knowledge-Enhanced (DRAKE) multimodal scene graph generation. The model supplements the missing visual contexts with well-aligned textual knowledge. It first represents the textual information into object-aware knowledge representation with the help of vision data. Furthermore, our proposed DRAKE facilitates the interaction of the info between multimodal pair-wise representations. A multimodal context enhancement layer can be devised to help the model generate the scene graph. To evaluate the model performance of SGG on social media images, we propose a social media SGG dataset called MSG. We comprehensively analyze the effectiveness of our proposed method on the MSG dataset. The experimental results on the MSG dataset indicate that our model outperforms the previous methods. To fairly compare our method with other SGG models, we also conduct experiments on the Visual Genome dataset for more analysis The MSG dataset is released on https://github.com/FuZe4ever/MSG.

Original languageEnglish
Pages (from-to)3199-3213
Number of pages15
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume33
Issue number7
DOIs
Publication statusPublished - 1 Jul 2023

Keywords

  • knowledge enhancement
  • pair-wise alignment
  • Scene graph generation
  • social media posts

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'DRAKE: Deep Pair-Wise Relation Alignment for Knowledge-Enhanced Multimodal Scene Graph Generation in Social Media Posts'. Together they form a unique fingerprint.

Cite this