Skip to main navigation Skip to search Skip to main content

Automated “E”-aware data processing for construction ESG using building information modeling and large language model

  • Xingbo Gong
  • , Xingyu Tao
  • , Yuqing Xu
  • , Helen H.L. Kwok
  • , Weiwei Chen
  • , Da Shi
  • , Dezhi Li
  • , Jack C.P. Cheng

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

Environmental, Social and Governance (ESG) assessment and disclosure are critical for architecture, engineering, and construction (AEC) companies to market their financial results, reputational position, and compliance with regulatory requirements. Within this framework, the environmental (“E”) dimension presents unique and formidable data management challenges distinct from social and governance aspects. Specifically, the complex interplay of quantitative metrics and qualitative descriptions within ‘E’-aware data (e.g., measurable resource consumption alongside descriptive material sourcing practices, emissions figures coupled with compliance narratives), amplified by its sheer volume and the persistent ambiguity of environmental indicators and reporting standards, poses significant obstacles to effective ‘E’-aware data disclosure. Large Language Models (LLMs) possess inherent advantages in processing such complex environmental information due to their proficient language processing and generalization capabilities. Nonetheless, the development of LLM-based methods explicitly tailored for environmental data management within the construction sector remains underexplored. To this end, this study introduces an automated, LLM-enhanced “E”-aware data processing approach for the construction industry. The innovation of this framework is threefold. First, fifteen “E”-aware indicators are meticulously crafted to align with the specific needs of construction entities. Second, an “E”-aware algorithm, integrated within the Building Information Modeling (BIM) framework, is devised to streamline the aggregation and quantification of environmental data. Third, an LLM-enhanced complex structured data processing mechanism using retrieval augmented generation (RAG) is proposed to facilitate the efficient processing of “E”-aware data pertinent to construction projects. An illustrative case study is employed to validate the feasibility and efficacy of the proposed methodology. The results demonstrate that the developed automated RAG-LLM enhanced framework significantly advances current practice by: (1) enabling standardized “E”-aware data specifications and source mapping; (2) drastically reducing processing time for large-scale ESG documentation (saving 64.4% of time); and (3) providing a robust solution for handling multi-source, multi-format data, thereby enhancing the efficiency and reliability of environmental management and ESG disclosure in the AEC industry.

Original languageEnglish
Article number103920
JournalAdvanced Engineering Informatics
Volume69
DOIs
Publication statusPublished - Jan 2026

Keywords

  • Building information modelling (BIM)
  • Carbon management
  • Environmental, Social, and Governance (ESG)
  • Large language model (LLM)
  • Sustainable construction

ASJC Scopus subject areas

  • Information Systems
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Automated “E”-aware data processing for construction ESG using building information modeling and large language model'. Together they form a unique fingerprint.

Cite this