Learning from class-imbalanced data: Review of methods and applications

Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, Gong Bing

Research output: Journal article publicationReview articleAcademic researchpeer-review

1317 Citations (Scopus)

Abstract

Rare events, especially those that could potentially negatively impact society, often require humans’ decision-making responses. Detecting rare events can be viewed as a prediction task in data mining and machine learning communities. As these events are rarely observed in daily life, the prediction task suffers from a lack of balanced data. In this paper, we provide an in depth review of rare event detection from an imbalanced learning perspective. Five hundred and seventeen related papers that have been published in the past decade were collected for the study. The initial statistics suggested that rare events detection and imbalanced learning are concerned across a wide range of research areas from management science to engineering. We reviewed all collected papers from both a technical and a practical point of view. Modeling methods discussed include techniques such as data preprocessing, classification algorithms and model evaluation. For applications, we first provide a comprehensive taxonomy of the existing application domains of imbalanced learning, and then we detail the applications for each category. Finally, some suggestions from the reviewed papers are incorporated with our experiences and judgments to offer further research directions for the imbalanced learning and rare event detection fields.

Original languageEnglish
Pages (from-to)220-239
Number of pages20
JournalExpert Systems with Applications
Volume73
DOIs
Publication statusPublished - 1 May 2017
Externally publishedYes

Keywords

  • Data mining
  • Imbalanced data
  • Machine learning
  • Rare events

ASJC Scopus subject areas

  • Engineering(all)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Learning from class-imbalanced data: Review of methods and applications'. Together they form a unique fingerprint.

Cite this