Abstract
Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.
Original language | English |
---|---|
Article number | 106305 |
Journal | Automation in Construction |
Volume | 177 |
DOIs | |
Publication status | Published - Sept 2025 |
Externally published | Yes |
Keywords
- Construction safety ontology
- Construction site safety monitoring
- Context-aware vision-language model
- Domain-tailored prompt engineering
- Virtual construction safety assistant
ASJC Scopus subject areas
- Control and Systems Engineering
- Civil and Structural Engineering
- Building and Construction