Abstract
Noise in class labels of any training set can lead to poor classification results no matter what machine learning method is used. In this paper, we first present the problem of binary classification in the presence of random noise on the class labels, which we call class noise. To model class noise, a class noise rate is normally defined as a small independent probability of the class labels being inverted on the whole set of training data. In this paper, we propose a method to estimate class noise rate at the level of individual samples in real data. Based on the estimation result, we propose two approaches to handle class noise. The first technique is based on modifying a given surrogate loss function. The second technique eliminates class noise by sampling. Furthermore, we prove that the optimal hypothesis on the noisy distribution can approximate the optimal hypothesis on the clean distribution using both approaches. Our methods achieve over 87% accuracy on a synthetic non-separable dataset even when 40% of the labels are inverted. Comparisons to other algorithms show that our methods outperform state-of-the-art approaches on several benchmark datasets in different domains with different noise rates.
Original language | English |
---|---|
Title of host publication | CIKM 2015 - Proceedings of the 24th ACM International Conference on Information and Knowledge Management |
Publisher | Association for Computing Machinery |
Pages | 1081-1090 |
Number of pages | 10 |
Volume | 19-23-Oct-2015 |
ISBN (Electronic) | 9781450337946 |
DOIs | |
Publication status | Published - 17 Oct 2015 |
Event | 24th ACM International Conference on Information and Knowledge Management, CIKM 2015 - Melbourne, Australia Duration: 19 Oct 2015 → 23 Oct 2015 |
Conference
Conference | 24th ACM International Conference on Information and Knowledge Management, CIKM 2015 |
---|---|
Country/Territory | Australia |
City | Melbourne |
Period | 19/10/15 → 23/10/15 |
Keywords
- Class noise
- Learning with noise
- Noise elimination
ASJC Scopus subject areas
- General Decision Sciences
- General Business,Management and Accounting