TY - JOUR
T1 - Hierarchical integrated machine learning model for predicting flight departure delays and duration in series
AU - Khan, Waqar Ahmed
AU - Ma, Hoi Lam
AU - Chung, Sai Ho
AU - Wen, Xin
N1 - Funding Information:
The work described in this paper was supported by a grant from the Research Grants Council of the Hong Kong Special Administration Region, China (UGC/FDS14/E04/19).
Publisher Copyright:
© 2021
PY - 2021/8
Y1 - 2021/8
N2 - Flight delays may propagate through the entire aviation network and are becoming an important research topic. This paper proposes a novel hierarchical integrated machine learning model for predicting flight departure delays and duration in series rather than in parallel to avoid ambiguity in decision making. The paper analyses the proposed model using various machine learning algorithms in combination with different sampling techniques. The highly noisy, unbalanced, dispersed, and skewed historical high dimensional data provided by an international airline operating in Hong Kong was used to demonstrate the practical application of the model. The result shows that for a 4-h forecast horizon, a constructive neural network machine learning algorithm with the Synthetic Minority Over Sampling Technique-Tomek Links (SMOTETomek) sampling technique was able to achieve better average balanced recall accuracies of 65.5%, 61.5%, 59% for classifying delay status and predicting delay duration at thresholds of 60 min and 30 min, respectively. Similarly, for minority labels, the precision-recall and area under the curve showed that the proposed model achieved better results of 32.44% and 35.14% compared to the parallel model of 26.43% and 21.02% for thresholds of 60 min and 30 min, respectively. The effect of different sampling techniques, sampling approaches, and estimation mechanisms on prediction performance is also studied.
AB - Flight delays may propagate through the entire aviation network and are becoming an important research topic. This paper proposes a novel hierarchical integrated machine learning model for predicting flight departure delays and duration in series rather than in parallel to avoid ambiguity in decision making. The paper analyses the proposed model using various machine learning algorithms in combination with different sampling techniques. The highly noisy, unbalanced, dispersed, and skewed historical high dimensional data provided by an international airline operating in Hong Kong was used to demonstrate the practical application of the model. The result shows that for a 4-h forecast horizon, a constructive neural network machine learning algorithm with the Synthetic Minority Over Sampling Technique-Tomek Links (SMOTETomek) sampling technique was able to achieve better average balanced recall accuracies of 65.5%, 61.5%, 59% for classifying delay status and predicting delay duration at thresholds of 60 min and 30 min, respectively. Similarly, for minority labels, the precision-recall and area under the curve showed that the proposed model achieved better results of 32.44% and 35.14% compared to the parallel model of 26.43% and 21.02% for thresholds of 60 min and 30 min, respectively. The effect of different sampling techniques, sampling approaches, and estimation mechanisms on prediction performance is also studied.
KW - Air traffic
KW - Aviation
KW - Flight delay prediction
KW - High dimensional data
KW - Machine learning
KW - Sampling techniques
UR - http://www.scopus.com/inward/record.url?scp=85107835425&partnerID=8YFLogxK
U2 - 10.1016/j.trc.2021.103225
DO - 10.1016/j.trc.2021.103225
M3 - Journal article
AN - SCOPUS:85107835425
SN - 0968-090X
VL - 129
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 103225
ER -