As an important research topic, real-time crash likelihood prediction has been studied for many years. However, few research focuses on the missing data imputation in real-time crash likelihood prediction, although missing values are commonly observed due to breakdown of sensors or external interference. Besides, classifying imbalanced data is also a critical issue in real-time crash likelihood prediction, since the number of crash-prone cases is much smaller than that of non-crash cases. In this paper, three principal component analysis (PCA) based approaches are established for imputing missing values, while two kinds of solutions are developed to tackle the issue of imbalanced data. The results show that the proposed methods can help the classifiers achieve better predictive performance under situations with missing data. The two solutions, i.e. cost-sensitive learning, and synthetic minority oversampling technique (SMOTE), can help improve the sensitivity by adjusting the classifiers to pay more attention to the minority class.
- cost-sensitive learning
- PCA-based missing data imputation
- Real-time crash likelihood prediction
- support vector machine
ASJC Scopus subject areas