An under-sampling method based on fuzzy logic for large imbalanced dataset

Ginny Y. Wong, Hung Fat Frank Leung, Sai Ho Ling

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

5 Citations (Scopus)

Abstract

Large imbalanced datasets have introduced difficulties to classification problems. They cause a high error rate of the minority class samples and a long training time of the classification model. Therefore, re-sampling and data size reduction have become important steps to pre-process the data. In this paper, a sampling strategy over a large imbalanced dataset is proposed, in which the samples of the larger class are selected based on fuzzy logic. To further reduce the data size, the evolutionary computational method of CHC is employed. The evaluation is done by applying a Support Vector Machine (SVM) to train a classification model from the re-sampled training sets. From experimental results, it can be seen that our proposed method improves both the F-measure and AUC. The complexity of the classification model is also compared. It is found that our proposed method is superior to all other compared methods.
Original languageEnglish
Title of host publicationProceedings of the 2014 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE
PublisherIEEE
Pages1248-1252
Number of pages5
ISBN (Electronic)9781479920723
DOIs
Publication statusPublished - 1 Jan 2014
Event2014 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2014 - Beijing, China
Duration: 6 Jul 201411 Jul 2014

Conference

Conference2014 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2014
Country/TerritoryChina
CityBeijing
Period6/07/1411/07/14

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Software
  • Artificial Intelligence
  • Applied Mathematics

Cite this