Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues

Hongliang Ding, Zhuo Liu, Hanlong Fu, Xiaowen Fu, Tiantian Chen (Corresponding Author), Jinhua Zhao

Research output: Journal article publicationJournal articleAcademic researchpeer-review

Abstract

The growing prevalence of autonomous vehicles (AVs) offers new opportunities for enhancing traffic efficiency. However, AVs still face significant challenges that impact their safety and effectiveness in preventing accidents. Real-world operational data is therefore essential to identifying the factors contributing to AV crashes. Despite this, the analysis of AV crashes is still hampered by a lack of data, missing information, and underreporting, which negatively impacts its accuracy and comprehensiveness. To address this challenge, a method based on Generative Adversarial Networks (GANs) was used for data imputation, leveraging their advantage in handling heterogeneous data. An evaluation of the performance of our proposed data imputation approach was performed by comparing it with two established methods, namely conventional case deletion and Random Forest (RF) imputation. Synthetic data obtained from these three methods were modelled using the random parameters logit model with heterogeneity in means. Data from the California Department of Motor Vehicles (DMV) and the National Highway Traffic Safety Administration (NHTSA) covering 2021–2023 were used. Our results showed that the model based on Generative Adversarial Imputation Networks (GAIN)- processing data outperformed other candidate methods in terms of fitting, predictive accuracy, and factor interpretation. Our results suggest that factors including speed limit, roadway types, head-on crashes, and takeover of ADAS-equipped vehicles are positively associated with serious injury crashes. On the other hand, ADS engagement and crashes with fixed objects exhibit a negative association with serious injury crashes. Additionally, heterogeneous effects of posted speed limits and ADS engagement on AV crash severity were captured to provide a deeper insight into implications.

Original languageEnglish
Article number105154
Number of pages20
JournalTransportation Research Part C: Emerging Technologies
Volume176
DOIs
Publication statusPublished - Jul 2025

Keywords

  • ADAS
  • ADS
  • Autonomous Vehicle Crash
  • Crash Severity Model
  • Machine Learning
  • Risk Factors

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Automotive Engineering
  • Transportation
  • Management Science and Operations Research

Fingerprint

Dive into the research topics of 'Can AV crash datasets provide more insight if missing information is supplemented? Employing Generative Adversarial Imputation Networks to Tackle Data Quality Issues'. Together they form a unique fingerprint.

Cite this