TY - JOUR
T1 - An integrated data- and theory-driven crash severity model
AU - Liu, Dongjie
AU - Li, Dawei
AU - Sze, N. N.
AU - Ding, Hongliang
AU - Song, Yuchen
N1 - Funding Information:
The study was supported by the National Key Research and Development Program of China (No. 2022YFB4300300), the National Natural Science Foundation of China (Nos. 71971056, 51608115), the Six Talent Peaks Project in Jiangsu Province (No. XNYQC-003), the Science & Technology Project of Jiangsu Province, China (BZ2020016), and the Science & Technology Development Fund of CHINA DESIGN GROUP (No. KY2022120).
Publisher Copyright:
© 2023 Elsevier Ltd
PY - 2023/12
Y1 - 2023/12
N2 - For crash severity modeling, researchers typically view theory-driven models and data-driven models as different or even conflicting approaches. The reason is that the machine-learning models offer good predictability but weak interpretability, while the latter has robust interpretability but moderate predictability. In order to alleviate the tension between them, this study proposes an integrated data- and theory-driven crash-severity model, known as Embedded Fusion model based on Text Vector Representations (TVR-EF), by leveraging the complementary strengths of both. The model specification consists of two parts. (i) the data-driven component not only mitigate the deficiencies of traditional econometric models, where one-hot encoding is frequently used and makes it impossible to observe semantic relatedness between variable categories, but also enhances the interpretability for the relationship between crash severity and potential influencing factors using the learned embedding weight matrix. (ii) In the theory-driven component, the multinomial logit model is implemented as a 2D-Convolutional Neural Network (2D-CNN) to increase flexibility and decrease dependency on prior knowledge for different crash-severity outcomes. A crash dataset from Guangdong Province, China, is utilized to estimate the TVR-EF model, which is then benchmarked against two traditional econometric models and three widely used machine-learning models. Results indicate that TVR-EF model does not only improve the predictive performance but also makes it easier to interpret.
AB - For crash severity modeling, researchers typically view theory-driven models and data-driven models as different or even conflicting approaches. The reason is that the machine-learning models offer good predictability but weak interpretability, while the latter has robust interpretability but moderate predictability. In order to alleviate the tension between them, this study proposes an integrated data- and theory-driven crash-severity model, known as Embedded Fusion model based on Text Vector Representations (TVR-EF), by leveraging the complementary strengths of both. The model specification consists of two parts. (i) the data-driven component not only mitigate the deficiencies of traditional econometric models, where one-hot encoding is frequently used and makes it impossible to observe semantic relatedness between variable categories, but also enhances the interpretability for the relationship between crash severity and potential influencing factors using the learned embedding weight matrix. (ii) In the theory-driven component, the multinomial logit model is implemented as a 2D-Convolutional Neural Network (2D-CNN) to increase flexibility and decrease dependency on prior knowledge for different crash-severity outcomes. A crash dataset from Guangdong Province, China, is utilized to estimate the TVR-EF model, which is then benchmarked against two traditional econometric models and three widely used machine-learning models. Results indicate that TVR-EF model does not only improve the predictive performance but also makes it easier to interpret.
KW - Crash severity
KW - Data- and theory-driven model
KW - Embedding representations
KW - Interpretable machine learning
KW - Logit model
UR - http://www.scopus.com/inward/record.url?scp=85171752952&partnerID=8YFLogxK
U2 - 10.1016/j.aap.2023.107282
DO - 10.1016/j.aap.2023.107282
M3 - Journal article
AN - SCOPUS:85171752952
SN - 0001-4575
VL - 193
JO - Accident Analysis and Prevention
JF - Accident Analysis and Prevention
M1 - 107282
ER -