MMED: A multi-domain and Multi-modality event dataset

Zhenguo Yang, Zehang Lin, Lingni Guo, Qing Li, Wenyin Liu

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

In this work, we release a multi-domain and multi-modality event dataset (MMED), containing 25,052 textual news articles collected from hundreds of news media sites (e.g., Yahoo News, BBC News, etc.) and 75,884 image posts shared on Flickr by thousands of social media users. The articles contributed by professional journalists and the images shared by amateur users are annotated according to 410 real-world events, covering emergencies, natural disasters, sports, ceremonies, elections, protests, military intervention, economic crises, etc. The MMED dataset is collected by the following the principles of high relevance in supporting the application needs, a wide range of event types, non-ambiguity of the event labels, imbalanced event clusters, and difficulty discriminating the event labels. The dataset can stimulate innovative research on related challenging problems, such as (weakly aligned) cross-modal retrieval and cross-domain event discovery, inspire visual relation mining and reasoning, etc. For comparisons, 15 baselines for two scenarios have been quantitatively and qualitatively evaluated using the dataset.

Original languageEnglish
Article number102315
Pages (from-to)1-14
Number of pages14
JournalInformation Processing and Management
Volume57
Issue number6
DOIs
Publication statusPublished - Nov 2020

Keywords

  • Benchmark dataset
  • Cross-modal retrieval
  • Real-world event detection

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this