Abstract
This article presents a simple sampling method, which is very easy to be implemented, for classification by introducing the idea of random space division, called ``random space division sampling'' (RSDS). It can extract the boundary points as the sampled result by efficiently distinguishing the label noise points, inner points, and boundary points. This makes it the first general sampling method for classification that not only can reduce the data size but also enhance the classification accuracy of a classifier, especially in the label-noisy classification. The ``general'' means that it is not restricted to any specific classifiers or datasets (regardless of whether a dataset is linear or not). Furthermore, the RSDS can online accelerate most classifiers because of its lower time complexity than most classifiers. Moreover, the RSDS can be used as an undersampling method for imbalanced classification. The experimental results on benchmark datasets demonstrate its effectiveness and efficiency. The code of the RSDS and comparison algorithms is available at: https://github.com/syxiaa/RSDS.
Original language | English |
---|---|
Journal | IEEE Transactions on Cybernetics |
DOIs | |
Publication status | Accepted/In press - 2021 |
Keywords
- Class noise
- Cybernetics
- Data mining
- imbalanced classification
- Kernel
- label noise
- Noise measurement
- sampling
- Sampling methods
- Tagging
- Time complexity
- undersampling.
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Information Systems
- Human-Computer Interaction
- Computer Science Applications
- Electrical and Electronic Engineering