TY - JOUR
T1 - A deep generative approach for crash frequency model with heterogeneous imbalanced data
AU - Ding, Hongliang
AU - Lu, Yuhuan
AU - Sze, N. N.
AU - Chen, Tiantian
AU - Guo, Yanyong
AU - Lin, Qinghai
N1 - Funding Information:
The work described in this paper was supported by the grants from the Research Grants Council of Hong Kong (Project No. 25203717) and the Hong Kong Polytechnic University (H-ZJMQ). This work was also supported by the National Natural Science Foundation of China (Grant No. 71701046 ).
Publisher Copyright:
© 2022 Elsevier Ltd
PY - 2022/6
Y1 - 2022/6
N2 - Crash frequency model is often subject to excessive zero observation because of the rare nature of crashes. To address the problem of imbalanced crash data, a deep generative approach – augmented variational autoencoder – was proposed to generate synthetic crash data for the association measure between crash and possible explanatory factors. This approach was characterized by a factorized generative model and refined objective function. For instance, the generative model can handle heterogeneous data including real-valued, nominal and ordinal distributions. On the other hand, the refined objective function can control for the random effect by better recognizing both the zero-crash and non-zero crash cases. In this study, comprehensive traffic and crash data of multiple distribution types in Hong Kong in the period between 2014 and 2016 were used. To assess the data generation performance of the proposed augmented variational autoencoder method, a conventional data synthesis technique (synthetic minority oversampling technique-nominal continuous) was also considered. Performances of crash frequency models of total crashes and fatal and severe injury crashes are assessed. For total crashes, the results of parameter estimation, in terms of statistical fit, prediction accuracy, and explanatory factors identified, of the crash frequency model based on synthetic data using the augmented variational autoencoder method adhered closer to that based on original data, compared to that based on synthetic data using the synthetic minority oversampling technique-nominal continuous method. For fatal and severe injury crashes, zero-crash observations were prevalent, with the ratio of zero-crash to non-zero crash cases of 9 to 1. Crash data was first balanced using the proposed augmented variational autoencoder method. Then, fatal and severe injury crash frequency models using correlated random parameter models based on original data and balanced data were estimated respectively. Results indicate that fatal and severe injury crash frequency model based on balanced data outperforms its counterpart, with the lowest root mean square error, lowest mean absolute error, and highest number of crash explanatory factors identified. More importantly, correlation between the random parameters can be revealed. Findings of this study should shed light to both researchers and practitioners for the development of crash frequency models, with which the problem of excessive zero observations is prevalent when highly disaggregated traffic and crash data by time and space are used.
AB - Crash frequency model is often subject to excessive zero observation because of the rare nature of crashes. To address the problem of imbalanced crash data, a deep generative approach – augmented variational autoencoder – was proposed to generate synthetic crash data for the association measure between crash and possible explanatory factors. This approach was characterized by a factorized generative model and refined objective function. For instance, the generative model can handle heterogeneous data including real-valued, nominal and ordinal distributions. On the other hand, the refined objective function can control for the random effect by better recognizing both the zero-crash and non-zero crash cases. In this study, comprehensive traffic and crash data of multiple distribution types in Hong Kong in the period between 2014 and 2016 were used. To assess the data generation performance of the proposed augmented variational autoencoder method, a conventional data synthesis technique (synthetic minority oversampling technique-nominal continuous) was also considered. Performances of crash frequency models of total crashes and fatal and severe injury crashes are assessed. For total crashes, the results of parameter estimation, in terms of statistical fit, prediction accuracy, and explanatory factors identified, of the crash frequency model based on synthetic data using the augmented variational autoencoder method adhered closer to that based on original data, compared to that based on synthetic data using the synthetic minority oversampling technique-nominal continuous method. For fatal and severe injury crashes, zero-crash observations were prevalent, with the ratio of zero-crash to non-zero crash cases of 9 to 1. Crash data was first balanced using the proposed augmented variational autoencoder method. Then, fatal and severe injury crash frequency models using correlated random parameter models based on original data and balanced data were estimated respectively. Results indicate that fatal and severe injury crash frequency model based on balanced data outperforms its counterpart, with the lowest root mean square error, lowest mean absolute error, and highest number of crash explanatory factors identified. More importantly, correlation between the random parameters can be revealed. Findings of this study should shed light to both researchers and practitioners for the development of crash frequency models, with which the problem of excessive zero observations is prevalent when highly disaggregated traffic and crash data by time and space are used.
KW - Augmented variational autoencoder
KW - Crash frequency model
KW - Imbalanced crash data
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85123983346&partnerID=8YFLogxK
U2 - 10.1016/j.amar.2022.100212
DO - 10.1016/j.amar.2022.100212
M3 - Journal article
AN - SCOPUS:85123983346
SN - 2213-6657
VL - 34
JO - Analytic Methods in Accident Research
JF - Analytic Methods in Accident Research
M1 - 100212
ER -