PCA-based missing information imputation for real-time crash likelihood prediction under imbalanced data

Jintao Ke, Shuaichao Zhang, Hai Yang, Xiqun Chen

Research output: Journal article publicationJournal articleAcademic researchpeer-review

15 Citations (Scopus)


As an important research topic, real-time crash likelihood prediction has been studied for many years. However, few research focuses on the missing data imputation in real-time crash likelihood prediction, although missing values are commonly observed due to breakdown of sensors or external interference. Besides, classifying imbalanced data is also a critical issue in real-time crash likelihood prediction, since the number of crash-prone cases is much smaller than that of non-crash cases. In this paper, three principal component analysis (PCA) based approaches are established for imputing missing values, while two kinds of solutions are developed to tackle the issue of imbalanced data. The results show that the proposed methods can help the classifiers achieve better predictive performance under situations with missing data. The two solutions, i.e. cost-sensitive learning, and synthetic minority oversampling technique (SMOTE), can help improve the sensitivity by adjusting the classifiers to pay more attention to the minority class.

Original languageEnglish
Pages (from-to)872-895
Number of pages24
JournalTransportmetrica A: Transport Science
Issue number2
Publication statusPublished - 29 Nov 2019
Externally publishedYes


  • adaboost
  • cost-sensitive learning
  • PCA-based missing data imputation
  • Real-time crash likelihood prediction
  • support vector machine

ASJC Scopus subject areas

  • Transportation
  • Engineering(all)

Cite this