Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning

Pin Lim, Chi Keong Goh, Kay Chen Tan

Research output: Journal article publicationJournal articleAcademic researchpeer-review

94 Citations (Scopus)

Abstract

Class imbalance problems, where the number of samples in each class is unequal, is prevalent in numerous real world machine learning applications. Traditional methods which are biased toward the majority class are ineffective due to the relative severity of misclassifying rare events. This paper proposes a novel evolutionary cluster-based oversampling ensemble framework, which combines a novel cluster-based synthetic data generation method with an evolutionary algorithm (EA) to create an ensemble. The proposed synthetic data generation method is based on contemporary ideas of identifying oversampling regions using clusters. The novel use of EA serves a twofold purpose of optimizing the parameters of the data generation method while generating diverse examples leveraging on the characteristics of EAs, reducing overall computational cost. The proposed method is evaluated on a set of 40 imbalance datasets obtained from the University of California, Irvine, database, and outperforms current state-of-the-art ensemble algorithms tackling class imbalance problems.

Original languageEnglish
Article number7496962
Pages (from-to)2850-2861
Number of pages12
JournalIEEE Transactions on Cybernetics
Volume47
Issue number9
DOIs
Publication statusPublished - Sept 2017
Externally publishedYes

Keywords

  • Class-imbalance
  • clustering
  • ensemble learning
  • evolutionary algorithms (EAs)
  • evolutionary cluster-based oversampling ensemble (ECO-Ensemble)
  • synthetic data generation

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Evolutionary Cluster-Based Synthetic Oversampling Ensemble (ECO-Ensemble) for Imbalance Learning'. Together they form a unique fingerprint.

Cite this