Multi-Granularity Feature Fusion for Image-guided Story Ending Generation

Pijian Li, Qingbao Huang, Zhigang Li, Yi Cai, Feng Shuang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

—Image-guided Story Ending Generation aims at generating a reasonable and logical ending given a story context and an ending-related image. The existing models have achieved some success by fusing global image features with story context through an attention mechanism. However, they ignore the logical relationship between the story context and the image regions, and have not considered the high-level semantic features of the image such as visual sentiment. This may cause the generated ending inconsistent with the logic or sentiment of the given information. In this paper, we propose a Multi-Granularity feature Fusion (MGF) model to solve this problem. Concretely, we first employ an image sentiment extractor to grasp the sentiment features of the image as part of the global image features. We then design a scene subgraph selector to capture the image features of the key region by picking the scene subgraph most relevant to the context. Finally, we fuse the textual and visual features from object level, region level, and global level, respectively. Our model is thereby capable of effectively capturing the key region features and visual sentiment of the image, so as to generate a more logical and sentimental ending. Experimental results show that our MGF model outperforms the state-of-the-art models on most metrics.

Original languageEnglish
Pages (from-to)3437-3449
Number of pages13
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume32
DOIs
Publication statusPublished - 2024

Keywords

  • Image-guided story ending generation
  • image sentiment
  • multi-granularity feature fusion
  • scene subgraph
  • story ending generation

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Multi-Granularity Feature Fusion for Image-guided Story Ending Generation'. Together they form a unique fingerprint.

Cite this