Video saliency detection via dynamic consistent spatio-temporal attention modelling

Sheng Hua Zhong, Yan Liu, Feifei Ren, Jinghuan Zhang, Tongwei Ren

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

85 Citations (Scopus)

Abstract

Human vision system actively seeks salient regions and movements in video sequences to reduce the search effort. Modeling computational visual saliency map provides important information for semantic understanding in many real world applications. In this paper, we propose a novel video saliency detection model for detecting the attended regions that correspond to both interesting objects and dominant motions in video sequences. In spatial saliency map, we inherit the classical bottom-up spatial saliency map. In temporal saliency map, a novel optical flow model is proposed based on the dynamic consistency of motion. The spatial and the temporal saliency maps are constructed and further fused together to create a novel attention model. The proposed attention model is evaluated on three video datasets. Empirical validations demonstrate the salient regions detected by our dynamic consistent saliency map highlight the interesting objects effectively and efficiency. More importantly, the automatically video attended regions detected by proposed attention model are consistent with the ground truth saliency maps of eye movement data.
Original languageEnglish
Title of host publicationProceedings of the 27th AAAI Conference on Artificial Intelligence, AAAI 2013
Pages1063-1069
Number of pages7
Publication statusPublished - 1 Dec 2013
Event27th AAAI Conference on Artificial Intelligence, AAAI 2013 - Bellevue, WA, United States
Duration: 14 Jul 201318 Jul 2013

Conference

Conference27th AAAI Conference on Artificial Intelligence, AAAI 2013
Country/TerritoryUnited States
CityBellevue, WA
Period14/07/1318/07/13

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Video saliency detection via dynamic consistent spatio-temporal attention modelling'. Together they form a unique fingerprint.

Cite this