TY - JOUR
T1 - Construction case-relevant article-level law identification using fine-tuned large language models
T2 - A study in China’s construction industry
AU - Zhou, Shenghua
AU - Xie, Shenming
AU - Yi, Wen
AU - Wang, Wentao
N1 - Publisher Copyright:
© 2025 Elsevier Ltd.
PY - 2026/3
Y1 - 2026/3
N2 - Construction case-relevant legal queries currently rely on inefficient online searches and costly expert consultations. Although existing studies have applied large language models (LLMs) to legal queries, a primary limitation is that general-purpose LLMs may not adapt to the construction domain. Moreover, most studies employ a one-stage paradigm that directly maps case facts to legal articles, which could be ineffective for handling extensive acts and articles in the construction industry. To resolve the limitations, this study proposes a two-stage act-article identification framework using fine-tuned LLMs, with the first stage filtering case-relevant acts and the second stage identifying applicable articles. It consists of (i) building a construction case dataset comprising 81,472 judgments, (ii) fine-tuning 8 LLMs to develop the act-article identification methods, and (iii) comparing the law identification performance with multiple baselines. The results show the act-article identification approach achieves an average F1-score of 0.757, significantly outperforming both general-purpose LLMs and specialized legal LLMs. Furthermore, it demonstrates a 62% improvement over one-stage approaches. This study makes three contributions by demonstrating court judgment-based fine-tuning to make general-purpose LLMs effectively adapt to the construction domain, revealing the superiority of the two-stage paradigm over one-stage approaches, and providing a large-scale reusable dataset of construction disputes.
AB - Construction case-relevant legal queries currently rely on inefficient online searches and costly expert consultations. Although existing studies have applied large language models (LLMs) to legal queries, a primary limitation is that general-purpose LLMs may not adapt to the construction domain. Moreover, most studies employ a one-stage paradigm that directly maps case facts to legal articles, which could be ineffective for handling extensive acts and articles in the construction industry. To resolve the limitations, this study proposes a two-stage act-article identification framework using fine-tuned LLMs, with the first stage filtering case-relevant acts and the second stage identifying applicable articles. It consists of (i) building a construction case dataset comprising 81,472 judgments, (ii) fine-tuning 8 LLMs to develop the act-article identification methods, and (iii) comparing the law identification performance with multiple baselines. The results show the act-article identification approach achieves an average F1-score of 0.757, significantly outperforming both general-purpose LLMs and specialized legal LLMs. Furthermore, it demonstrates a 62% improvement over one-stage approaches. This study makes three contributions by demonstrating court judgment-based fine-tuning to make general-purpose LLMs effectively adapt to the construction domain, revealing the superiority of the two-stage paradigm over one-stage approaches, and providing a large-scale reusable dataset of construction disputes.
KW - Construction case
KW - Fine-tuning
KW - Large language models
KW - Law identification
UR - https://www.scopus.com/pages/publications/105024362294
U2 - 10.1016/j.aei.2025.104144
DO - 10.1016/j.aei.2025.104144
M3 - Journal article
AN - SCOPUS:105024362294
SN - 1474-0346
VL - 70
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
M1 - 104144
ER -