Combining deep features and activity context to improve recognition of activities of workers in groups

Xiaochun Luo, Heng Li, Yantao Yu, Cheng Zhou, Dongping Cao

Research output: Journal article publicationJournal articleAcademic researchpeer-review

24 Citations (Scopus)


Automatic activity recognition plays an important role in addressing the efficiency issue of site management. In recent years, there has been an increasing interest in vision-based activity recognition, while its relatively low recognition accuracy and speed impede the practical application. This paper introduces a discriminative model to combine deep activity features and contextual information to improve the recognition of activities of workers on foot in site surveillance videos. Specifically, a conditional random field (CRF) model is designed based on deep activity features, which are extracted with a single-stream deep activity recognition network, and spatial relevance, which are obtained with a tracking-by-detection multiple-object tracking method. We have evaluated various deep activity features, including action features, activity features, and joint features. Also, we have parameterized the contextual information of activities in terms of spatial relevance and represent the context with graphs of K-nearest neighbors. The experimental results show that the CRF model based on deep activity features and activity context can significantly improve activity recognition performance to 98.77% average accuracy by 22.10% from the baseline 77.67%, which is obtained using the single-stream deep activity recognition network, with a small computational overhead of 0.025 ms per segment.

Original languageEnglish
JournalComputer-Aided Civil and Infrastructure Engineering
Publication statusAccepted/In press - 1 Jan 2020

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design
  • Computational Theory and Mathematics


Dive into the research topics of 'Combining deep features and activity context to improve recognition of activities of workers in groups'. Together they form a unique fingerprint.

Cite this