Abstract
Capturing the working states of workers on foot allows managers to precisely quantify and benchmark labor productivity, which in turn enables them to evaluate productivity losses and identify causes. Work sampling is a widely used method for this task, while suffers from low efficiency as only one worker is selected for each observation. Attentional selection asymmetry can also bias its uniform object selection assumption. Existing vision-based methods are primarily oriented towards recognizing single, separated activities involving few workers or equipment. In this paper, we introduce an activity recognition method, which receives surveillance videos as input and produces diverse and continuous activity labels of individual workers in the field of view. Convolutional networks are used to recognize activities, which are encoded in spatial and temporal streams. A new fusion strategy is developed to combine the recognition results of the two streams. The experimental results show that our activity recognition method has achieved an average accuracy of 80.5%, which is comparable with the state-of-the-art of activity recognition in the computer vision community, given the severe camera motion and low resolution of site surveillance videos and the marginal inter-class difference and significant intra-class variation of workers’ activities. We also demonstrate that our method can underpin the implementation of efficient and objective work sampling. The training and test datasets of the study are publicly available.
Original language | English |
---|---|
Pages (from-to) | 360-370 |
Number of pages | 11 |
Journal | Automation in Construction |
Volume | 94 |
DOIs | |
Publication status | Published - Oct 2018 |
Keywords
- Activity recognition
- Labor productivity evaluation
- Two-stream convolutional networks
- Work sampling
ASJC Scopus subject areas
- Control and Systems Engineering
- Civil and Structural Engineering
- Building and Construction