Visual Paraphrase Generation with Key Information Retained

Jiayuan Xie, Jiali Chen, Yi Cai, Qingbao Huang, Qing Li

Research output: Journal article publicationJournal articleAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Visual paraphrase generation task aims to rewrite a given image-related original sentence into a new paraphrase, where the paraphrase needs to have the same expressed meaning as the original sentence but have a difference in expression form. Existing studies mainly extract two semantic vectors to represent the entire image and the entire original sentence, respectively, for paraphrase generation. However, these semantic vectors for an image or a sentence may lead to the model failing to focus on some key objects in the original sentence, which may generate semantically inconsistent sentences by changing key object information. In this article, we propose an object-level paraphrase generation model, which generates paraphrases by adjusting the permutation of key objects and modifying their associated descriptions. To adjust the permutation of key objects, an object-sorting module aims to obtain new object sequences based on the key object information and original sentences. Then, a sequence generation module sequentially generates paraphrases based on the permutation of the newly object sequences. Each generation step focuses on different image features associated with different key objects to generate descriptions with differences. Furthermore, we use a semantic discriminator module to promote the generated paraphrase to be semantically close to the original sentence. Specifically, the loss function of the discriminator penalizes the excessive distance between the paraphrase and the original sentence. Extensive experiments on the MS COCO dataset show that the proposed model outperforms the baselines.

Original languageEnglish
Article number184
Pages (from-to)1-19
JournalACM Transactions on Multimedia Computing, Communications and Applications
Volume19
Issue number6
DOIs
Publication statusPublished - 30 May 2023

Keywords

  • Additional Key Words and PhrasesMultimodal
  • visual paraphrase generation
  • VisualBERT

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Visual Paraphrase Generation with Key Information Retained'. Together they form a unique fingerprint.

Cite this