TY - JOUR
T1 - PCA-based missing information imputation for real-time crash likelihood prediction under imbalanced data
AU - Ke, Jintao
AU - Zhang, Shuaichao
AU - Yang, Hai
AU - Chen, Xiqun
N1 - Funding Information:
This research is financially supported by Zhejiang Provincial Natural Science Foundation of China [grant number LR17E080002], National Natural Science Foundation of China [grant numbers 51508505, 71771198, 51338008], Fundamental Research Funds for the Central Universities [grant number 2017QNA4025], and the Key Research and Development Program of Zhejiang [grant number 2018C01007].
Publisher Copyright:
© 2018, © 2018 Hong Kong Society for Transportation Studies Limited.
PY - 2019/11/29
Y1 - 2019/11/29
N2 - As an important research topic, real-time crash likelihood prediction has been studied for many years. However, few research focuses on the missing data imputation in real-time crash likelihood prediction, although missing values are commonly observed due to breakdown of sensors or external interference. Besides, classifying imbalanced data is also a critical issue in real-time crash likelihood prediction, since the number of crash-prone cases is much smaller than that of non-crash cases. In this paper, three principal component analysis (PCA) based approaches are established for imputing missing values, while two kinds of solutions are developed to tackle the issue of imbalanced data. The results show that the proposed methods can help the classifiers achieve better predictive performance under situations with missing data. The two solutions, i.e. cost-sensitive learning, and synthetic minority oversampling technique (SMOTE), can help improve the sensitivity by adjusting the classifiers to pay more attention to the minority class.
AB - As an important research topic, real-time crash likelihood prediction has been studied for many years. However, few research focuses on the missing data imputation in real-time crash likelihood prediction, although missing values are commonly observed due to breakdown of sensors or external interference. Besides, classifying imbalanced data is also a critical issue in real-time crash likelihood prediction, since the number of crash-prone cases is much smaller than that of non-crash cases. In this paper, three principal component analysis (PCA) based approaches are established for imputing missing values, while two kinds of solutions are developed to tackle the issue of imbalanced data. The results show that the proposed methods can help the classifiers achieve better predictive performance under situations with missing data. The two solutions, i.e. cost-sensitive learning, and synthetic minority oversampling technique (SMOTE), can help improve the sensitivity by adjusting the classifiers to pay more attention to the minority class.
KW - adaboost
KW - cost-sensitive learning
KW - PCA-based missing data imputation
KW - Real-time crash likelihood prediction
KW - SMOTE
KW - support vector machine
UR - http://www.scopus.com/inward/record.url?scp=85057246883&partnerID=8YFLogxK
U2 - 10.1080/23249935.2018.1542414
DO - 10.1080/23249935.2018.1542414
M3 - Journal article
AN - SCOPUS:85057246883
SN - 2324-9935
VL - 15
SP - 872
EP - 895
JO - Transportmetrica A: Transport Science
JF - Transportmetrica A: Transport Science
IS - 2
ER -