A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k
Enes DEDEOGLU , Himmet Toprak KESGIN , Mehmet Fatih AMASYALI
Front. Comput. Sci. ›› 2024, Vol. 18 ›› Issue (4) : 184315
The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.
robust optimization / label noise / noisy label / deep learning / noisy datasets / noise ratio estimation / robust training
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
Chi Y, Li Y, Zhang H, Liang Y. Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation. In: Proceedings of the 3rd International MATHEON Conference on Compressed Sensing and Its Applications. 2019, 237−261 |
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
Krizhevsky A. Learning multiple layers of features from tiny images.Technical Report, 2009 |
| [22] |
|
| [23] |
|
| [24] |
comet-examples/comet-keras-cnn-lstm-example.py at master • comet-ml/comet-examples • github. See qwone.com/~jason/20Newsgroups website, 2021 |
| [25] |
|
| [26] |
kaggle. Sarcasm detection: a guide for ML and DL approach. See kaggle.com/subbhashit/sarcasm-detection-a-guide-for-ml-and-dl-approach website. 2021 |
| [27] |
|
| [28] |
kaggle. Hotel reviews sentiment prediction. See kaggle.com/code/shahraizanwar/hotel-reviews-sentiment-prediction/notebook website. 2021 |
| [29] |
Home page for 20 newsgroups data set. See qwone.com/~jason/ 20Newsgroups website, 2014 |
| [30] |
Team K. Using pre-trained word embeddings. See keras.io/examples/nlp/pretrained_word_embeddings website, 2021 |
Higher Education Press
Supplementary files
/
| 〈 |
|
〉 |