Cross-modal event retrieval: A dataset and a baseline using deep semantic learning

Runwei Situ, Zhenguo Yang, Jianming Lv, Qing Li, Wenyin Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

In this paper, we propose to learn Deep Semantic Space (DSS) for cross-modal event retrieval, which is achieved by exploiting deep learning models to extract semantic features from images and textual articles jointly. More specifically, a VGG network is used to transfer deep semantic knowledge from a large-scale image dataset to the target image dataset. Simultaneously, a fully-connected network is designed to model semantic representation from textual features (e.g., TF-IDF, LDA). Furthermore, the obtained deep semantic representations for image and text can be mapped into a high-level semantic space, in which the distance between data samples can be measured straightforwardly for cross-model event retrieval. In particular, we collect a dataset called Wiki-Flickr event dataset for cross-modal event retrieval, where the data are weakly aligned unlike image-text pairs in the existing cross-modal retrieval datasets. Extensive experiments conducted on both the Pascal Sentence dataset and our Wiki-Flickr event dataset show that our DSS outperforms the state-of-the-art approaches.

Original languageEnglish
Title of host publicationAdvances in Multimedia Information Processing – PCM 2018 - 19th Pacific-Rim Conference on Multimedia, 2018, Proceedings
EditorsWen-Huang Cheng, Toshihiko Yamasaki, Chong-Wah Ngo, Richang Hong, Meng Wang
PublisherSpringer-Verlag
Pages147-157
Number of pages11
ISBN (Print)9783030007669
DOIs
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event19th Pacific-Rim Conference on Multimedia, PCM 2018 - Hefei, China
Duration: 21 Sep 201822 Sep 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11165 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference19th Pacific-Rim Conference on Multimedia, PCM 2018
CountryChina
CityHefei
Period21/09/1822/09/18

Keywords

  • Common space
  • Cross-modal event retrieval
  • Deep learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this