A Feature Engineering and Ensemble Learning Based Approach for Repeated Buyers Prediction

M. Zhang, J. Lu, N. Ma, T. C.E. Cheng, G. Hua

Research output: Journal article publicationJournal articleAcademic researchpeer-review

3 Citations (Scopus)


The global e-commerce market is growing at a rapid pace, but the percentage of repeat buyers is low. According to Tmall, the repurchase rate is only 6.1%, while research shows that a 5% increase in the repurchase rate can lead to a 25% to 95% increase in profit. To increase the repurchase rate, merchants need to predict potential repeat buyers and convert them into repurchasers. Therefore, it is necessary to predict repeat buyers. In this paper we build a prediction model of repeat purchasers using Tmall’s dataset. First, we build high-quality feature engineering for e-commerce scenarios by manual construction and algorithmic selection. We introduce the synthetic minority oversampling technique (SMOTE) algorithm to solve the data imbalance problem and improve prediction performance. Then we train classical classifiers including factorization machine and logistic regression, and ensemble learning classifiers including extreme gradient boosting, and light gradient boosting machine machines. Finally, we construct a two-layer fusion model based on the Stacking algorithm to further enhance prediction performance. The results show that through a series of innovations such as data imbalance processing, feature engineering, and fusion models, the model area under curve (AUC) value is improved by 0.01161. Our findings provide important implications for managing e-commerce platforms and the platform merchants.

Original languageEnglish
Article number4988
JournalInternational Journal of Computers, Communications and Control
Issue number6
Publication statusPublished - 2022


  • Ensemble learning
  • Feature engineering
  • Fusion model
  • Repeat buyer prediction

ASJC Scopus subject areas

  • Computer Science Applications
  • Computer Networks and Communications
  • Computational Theory and Mathematics


Dive into the research topics of 'A Feature Engineering and Ensemble Learning Based Approach for Repeated Buyers Prediction'. Together they form a unique fingerprint.

Cite this