Cross-modal event retrieval: A dataset and a baseline using deep semantic learning

Runwei Situ, Zhenguo Yang, Jianming Lv, Qing Li, Wenyin Liu

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

6 Citations (Scopus)


In this paper, we propose to learn Deep Semantic Space (DSS) for cross-modal event retrieval, which is achieved by exploiting deep learning models to extract semantic features from images and textual articles jointly. More specifically, a VGG network is used to transfer deep semantic knowledge from a large-scale image dataset to the target image dataset. Simultaneously, a fully-connected network is designed to model semantic representation from textual features (e.g., TF-IDF, LDA). Furthermore, the obtained deep semantic representations for image and text can be mapped into a high-level semantic space, in which the distance between data samples can be measured straightforwardly for cross-model event retrieval. In particular, we collect a dataset called Wiki-Flickr event dataset for cross-modal event retrieval, where the data are weakly aligned unlike image-text pairs in the existing cross-modal retrieval datasets. Extensive experiments conducted on both the Pascal Sentence dataset and our Wiki-Flickr event dataset show that our DSS outperforms the state-of-the-art approaches.

Original languageEnglish
Title of host publicationAdvances in Multimedia Information Processing – PCM 2018 - 19th Pacific-Rim Conference on Multimedia, 2018, Proceedings
EditorsWen-Huang Cheng, Toshihiko Yamasaki, Chong-Wah Ngo, Richang Hong, Meng Wang
Number of pages11
ISBN (Print)9783030007669
Publication statusPublished - 1 Jan 2018
Externally publishedYes
Event19th Pacific-Rim Conference on Multimedia, PCM 2018 - Hefei, China
Duration: 21 Sept 201822 Sept 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11165 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference19th Pacific-Rim Conference on Multimedia, PCM 2018


  • Common space
  • Cross-modal event retrieval
  • Deep learning

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)


Dive into the research topics of 'Cross-modal event retrieval: A dataset and a baseline using deep semantic learning'. Together they form a unique fingerprint.

Cite this