Abstract
Visual question answering (VQA) is a challenging task that reasons over questions on images with knowledge. A prerequisite for VQA is the availability of annotated datasets, while the available datasets have several limitations. 1) The diversity of questions and answers are limited to a few question categories and certain concepts (e.g., objects, relations, actions.) with somewhat mechanical answers. 2) The availability of background knowledge or context information has been disregarded with just images, questions and answers being provided. 3) The timeliness of knowledge has not been examined, though some works may introduce factual or commonsense knowledge bases, e.g., ConceptNet, DBPedia. In this paper, we provide an Event-oriented Visual Question Answering (E-VQA) dataset including free-form questions and answers for real-world event concepts, which provides context information of events as domain knowledge in addition to images. E-VQA consists of 2,690 social media images, 9,088 questions, 5,479 answers, and 1,157 news media articles for references being annotated to 182 real-world events, covering a wide range of topics, such as armed conflicts and attacks, disasters and accidents, law and crime. For comparisons, we investigate 10 state-of-the-art VQA methods as benchmarks.
Original language | English |
---|---|
Pages (from-to) | 10210-10223 |
Number of pages | 14 |
Journal | IEEE Transactions on Knowledge and Data Engineering |
Volume | 35 |
Issue number | 10 |
DOIs | |
Publication status | Published - 1 Oct 2023 |
Keywords
- event mining
- social media data mining
- Visual question answering
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics