Abstract
Imbalanced data cause deep neural networks to output biased results, and it becomes more serious when facing extremely imbalanced data regarding the outliers with tiny size (the ratio of the outlier size to the image size is around 0.05%). Many data argumentation models are proposed to supplement imbalanced data to alleviate biased results. However, the existing augmentation models cannot synthesize tiny outliers, which make the generated data unavailable. In this article, we propose a new augmentation model named extremely imbalanced data augmentation generative adversarial nets (EID-GANs) to address the extremely imbalanced data augmentation problem. First, we design a new penalty function by subtracting the outliers from the cropped region of generated instance to guide the generator to learn the features of outliers. After this, we combine the output value of the penalty function with the generator loss to jointly update the generator’s parameters with backpropagation. Second, we propose a new evaluation approach that adopts two outlier detectors with k -fold cross-validation to assess the availability of generated instances. We conduct extensive experiments to demonstrate the significant performance improvement of EID-GAN on two extremely imbalanced datasets, which are the industrial Piston and the Fabric datasets, and one general imbalanced dataset, i.e., the public DAGM dataset. The experimental results show that our EID-GAN outperforms the state-of-the-art (SOTA) augmentation models on different imbalanced datasets.
Original language | English |
---|---|
Pages (from-to) | 3208-3218 |
Number of pages | 11 |
Journal | IEEE Transactions on Industrial Informatics |
Volume | 19 |
Issue number | 3 |
DOIs | |
Publication status | Published - Mar 2023 |
Keywords
- Extremely imbalanced data augmentation
- generative adversarial net (GAN)
- generated data evaluation
- norm penalty function