Abstract
Traditional approaches of construction site safety monitoring heavily rely on manual on-site inspection, which are prone to overlooked incidents. Existing computer vision methods require time-consuming and case-by-case data labeling, and lack high-level reasoning capability. This paper develops a human-alike virtual assistant agent by integrating a multi-modal vision-language model into video analytics: (1) To efficiently generate image-text data for model development, a semi-automatic image-text labeling pipeline based on in-context learning is designed; (2) To optimize a virtual agent from pre-trained to domain-tailored, a two-stage curriculum learning paradigm is designed to enhance model fine-tuning effectiveness toward domain-specific tasks; (3) To inject construction-domain knowledge more effectively into the virtual agent, a hierarchical prompting framework driven by a construction safety ontology is developed for more domain-tailored reasoning capability. The virtual agent has been deployed on a real construction site for real-time video analytics, with over 90 % accuracy in identifying violations of work-at-height safety regulations.
| Original language | English |
|---|---|
| Article number | 106305 |
| Journal | Automation in Construction |
| Volume | 177 |
| DOIs | |
| Publication status | Published - Sept 2025 |
| Externally published | Yes |
Keywords
- Construction safety ontology
- Construction site safety monitoring
- Context-aware vision-language model
- Domain-tailored prompt engineering
- Virtual construction safety assistant
ASJC Scopus subject areas
- Control and Systems Engineering
- Civil and Structural Engineering
- Building and Construction
Fingerprint
Dive into the research topics of 'Context-aware vision-language model agent enriched with domain-specific ontology for construction site safety monitoring'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver