A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k
Enes DEDEOGLU, Himmet Toprak KESGIN, Mehmet Fatih AMASYALI
A robust optimization method for label noisy datasets based on adaptive threshold: Adaptive-k
The use of all samples in the optimization process does not produce robust results in datasets with label noise. Because the gradients calculated according to the losses of the noisy samples cause the optimization process to go in the wrong direction. In this paper, we recommend using samples with loss less than a threshold determined during the optimization, instead of using all samples in the mini-batch. Our proposed method, Adaptive-k, aims to exclude label noise samples from the optimization process and make the process robust. On noisy datasets, we found that using a threshold-based approach, such as Adaptive-k, produces better results than using all samples or a fixed number of low-loss samples in the mini-batch. On the basis of our theoretical analysis and experimental results, we show that the Adaptive-k method is closest to the performance of the Oracle, in which noisy samples are entirely removed from the dataset. Adaptive-k is a simple but effective method. It does not require prior knowledge of the noise ratio of the dataset, does not require additional model training, and does not increase training time significantly. In the experiments, we also show that Adaptive-k is compatible with different optimizers such as SGD, SGDM, and Adam. The code for Adaptive-k is available at GitHub.
robust optimization / label noise / noisy label / deep learning / noisy datasets / noise ratio estimation / robust training
Enes Dedeoglu is pursuing his bachelor’s degree in Computer Engineering at Yildiz Technical University, Turkey. His areas of interest include artificial intelligence, deep learning, machine learning and natural language processing
Himmet Toprak Kesgin graduated from Yildiz Technical University, Turkey in 2020. He received his master’s degree from the same university in 2022. Kesgin is currently working as a research assistant and pursuing his PhD at the Department of Computer Engineering, Yildiz Technical University, Turkey. His areas of interest include machine learning, natural language processing, and expert systems
Mehmet Fatih Amasyali received the MS degree from Yildiz Technical University, Turkey in 2003, and the PhD degree from the same university in 2008. He did his postdoc study at Purdue University, West Lafayette School of Electrical and Computer Engineering, USA. Dr. Amasyali is currently an Associate Professor at the Department of Computer Engineering, Yildiz Technical University, Turkey. His interests include machine learning, natural language processing and autonomous robotics. He has published several scientific papers
[1] |
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O. Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 2021, 64( 3): 107–115
|
[2] |
Liao S, Jiang X, Ge Z. Weakly supervised multilayer perceptron for industrial fault classification with inaccurate and incomplete labels. IEEE Transactions on Automation Science and Engineering, 2022, 19( 2): 1192–1201
|
[3] |
Ortego D, Arazo E, Albert P, O’Connor N E, McGuinness K. Towards robust learning with different label noise distributions. In: Proceedings of the 25th International Conference on Pattern Recognition (ICPR). 2021, 7020−7027
|
[4] |
Arazo E, Ortego D, Albert P, O’Connor N, McGuinness K. Unsupervised label noise modeling and loss correction. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 312−321
|
[5] |
Nishi K, Ding Y, Rich A, Höllerer T. Augmentation strategies for learning with noisy labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021, 8018−8027
|
[6] |
Majidi N, Amid E, Talebi H, Warmuth M K. Exponentiated gradient reweighting for robust training under label noise and beyond. 2021, arXiv preprint arXiv: 2104.01493
|
[7] |
Shah V, Wu X, Sanghavi S. Choosing the sample with lowest loss makes SGD robust. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 2120−2130
|
[8] |
Bengio Y, Louradour J, Collobert R, Weston J. Curriculum learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 41−48
|
[9] |
Kesgin H T, Amasyali M F. Cyclical curriculum learning. 2022, arXiv preprint arXiv: 2202.05531
|
[10] |
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I W, Sugiyama M. Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. 2018, 8536−8546
|
[11] |
Shi X, Che W. Combating with extremely noisy samples in weakly supervised slot filling for automatic diagnosis. Frontiers of Computer Science, 2023, 17( 5): 175333
|
[12] |
Yang H, Jin Y, Li Z, Wang D B, Miao L, Geng X, Zhang M L. Learning from noisy labels via dynamic loss thresholding. 2021, arXiv preprint arXiv: 2104.02570
|
[13] |
Wei Y, Xue M, Liu X, Xu P. Data fusing and joint training for learning with noisy labels. Frontiers of Computer Science, 2022, 16( 6): 166338
|
[14] |
Yao Q, Yang H, Han B, Niu G, Kwok J T. Searching to exploit memorization effect in learning with noisy labels. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 1000
|
[15] |
Chi Y, Li Y, Zhang H, Liang Y. Median-truncated gradient descent: a robust and scalable nonconvex approach for signal estimation. In: Proceedings of the 3rd International MATHEON Conference on Compressed Sensing and Its Applications. 2019, 237−261
|
[16] |
Shen Y, Sanghavi S. Learning with bad training data via iterative trimmed loss minimization. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 5739−5748
|
[17] |
Nakamura K, Hong B W. Regularization in neural network optimization via trimmed stochastic gradient descent with noisy label. 2020, arXiv preprint arXiv: 2012.11073
|
[18] |
Kingma D P, Ba J. Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
[19] |
Deng L. The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Processing Magazine, 2012, 29( 6): 141–142
|
[20] |
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
|
[21] |
Krizhevsky A. Learning multiple layers of features from tiny images.Technical Report, 2009
|
[22] |
He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 630−645
|
[23] |
Maas A L, Daly R E, Pham P T, Huang D, Ng A Y, Potts C. Learning word vectors for sentiment analysis. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 2011, 142−150
|
[24] |
comet-examples/comet-keras-cnn-lstm-example.py at master • comet-ml/comet-examples • github. See qwone.com/~jason/20Newsgroups website, 2021
|
[25] |
Misra R, Arora P. Sarcasm detection using hybrid neural network. 2019, arXiv preprint arXiv: 1908.07414
|
[26] |
kaggle. Sarcasm detection: a guide for ML and DL approach. See kaggle.com/subbhashit/sarcasm-detection-a-guide-for-ml-and-dl-approach website. 2021
|
[27] |
Alam M H, Ryu W J, Lee S. Joint multi-grain topic sentiment: modeling semantic aspects for online reviews. Information Sciences, 2016, 339: 206–223
|
[28] |
kaggle. Hotel reviews sentiment prediction. See kaggle.com/code/shahraizanwar/hotel-reviews-sentiment-prediction/notebook website. 2021
|
[29] |
Home page for 20 newsgroups data set. See qwone.com/~jason/ 20Newsgroups website, 2014
|
[30] |
Team K. Using pre-trained word embeddings. See keras.io/examples/nlp/pretrained_word_embeddings website, 2021
|
/
〈 | 〉 |