Customer churn prediction using improved balanced random forests

Yaya Xie, Xiu Li, Wai Ting Ngai, Weiyun Ying

Research output: Journal article publicationJournal articleAcademic researchpeer-review

203 Citations (Scopus)

Abstract

Churn prediction is becoming a major focus of banks in China who wish to retain customers by satisfying their needs under resource constraints. In churn prediction, an important yet challenging problem is the imbalance in the data distribution. In this paper, we propose a novel learning method, called improved balanced random forests (IBRF), and demonstrate its application to churn prediction. We investigate the effectiveness of the standard random forests approach in predicting customer churn, while also integrating sampling techniques and cost-sensitive learning into the approach to achieve a better performance than most existing algorithms. The nature of IBRF is that the best features are iteratively learned by altering the class distribution and by putting higher penalties on misclassification of the minority class. We apply the method to a real bank customer churn data set. It is found to improve prediction accuracy significantly compared with other algorithms, such as artificial neural networks, decision trees, and class-weighted core support vector machines (CWC-SVM). Moreover, IBRF also produces better prediction results than other random forests algorithms such as balanced random forests and weighted random forests.
Original languageEnglish
Pages (from-to)5445-5449
Number of pages5
JournalExpert Systems with Applications
Volume36
Issue number3 PART 1
DOIs
Publication statusPublished - 1 Apr 2009

Keywords

  • Churn prediction
  • Imbalanced data
  • Random forests

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this