Robust AUC maximization for classification with pairwise confidence comparisons
Haochen SHI, Mingkun XIE, Shengjun HUANG
Robust AUC maximization for classification with pairwise confidence comparisons
Supervised learning often requires a large number of labeled examples, which has become a critical bottleneck in the case that manual annotating the class labels is costly. To mitigate this issue, a new framework called pairwise comparison (Pcomp) classification is proposed to allow training examples only weakly annotated with pairwise comparison, i.e., which one of two examples is more likely to be positive. The previous study solves Pcomp problems by minimizing the classification error, which may lead to less robust model due to its sensitivity to class distribution. In this paper, we propose a robust learning framework for Pcomp data along with a pairwise surrogate loss called Pcomp-AUC. It provides an unbiased estimator to equivalently maximize AUC without accessing the precise class labels. Theoretically, we prove the consistency with respect to AUC and further provide the estimation error bound for the proposed method. Empirical studies on multiple datasets validate the effectiveness of the proposed method.
Haochen Shi received BS, MS degrees in signal and information processing from Nanjing University of Aeronautics and Astronautics, China in 2016 and 2018, respectively, and received a MS degree in electronics from Queen’s University Belfast, UK in 2019. She is now a PhD student in Nanjing University of Aeronautics and Astronautics, China. Her main research interrests include machine learning, image processing and pattern recognition
Mingkun Xie received the BS degree in 2018. He is currently a PhD student in the MIIT Key Laboratory of Pattern Analysis and Machine Intelligence of Nanjing University of Aeronautics and Astronautics, China. He has served as a PC member of NeurIPS, ICML, ICLR, also a reviewer of TNNLS, MLJ. His research interests are mainly in machine learning. Particularly, he is interested in multi-label learning and weakly-supervised learning
Shengjun Huang received the BS and PhD in computer science from Nanjing University, China in 2008 and 2014, respectively. He is now a professor in the College of Computer Science and Technology of Nanjing University of Aeronautics and Astronautics, China. His main research interests include machine learning and data mining. He has been selected to the Young Elite Scientists Sponsorship Program by CAST in 2016, and won the China Computer Federation Outstanding Doctoral Dissertation Award in 2015, the KDD Best Poster Award at the in 2012, and the Microsoft Fellowship Award in 2011. He is a Junior Associate Editor of Frontiers of Computer Science
[1] |
Zhou Z H. A brief introduction to weakly supervised learning. National Science Review, 2018, 5( 1): 44–53
|
[2] |
Zhu X, Goldberg A B. Introduction to Semi-Supervised Learning. Cham: Springer, 2009, 1−130
|
[3] |
Niu G, Jitkrittum W, Dai B Hachiya H, Sugiyama M. Squared-loss mutual information regularization: a novel information-theoretic approach to semi-supervised learning. In: Proceedings of the 30th International Conference on International Conference on Machine Learning. 2013, III-10−III-18
|
[4] |
Natarajan N, Dhillon I S, Ravikumar P, Tewari A. Learning with noisy labels. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 1196−1204
|
[5] |
Liu T, Tao D. Classification with noisy labels by importance reweighting. IEEE Transactions on Pattern Analysis and Machine intelligence, 2016, 38( 3): 447–461
|
[6] |
Du Plessis M C, Niu G, Sugiyama M. Analysis of learning from positive and unlabeled data. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 703−711
|
[7] |
Du Plessis M, Niu G, Sugiyama M. Convex formulation for learning from positive and unlabeled data. In: Proceedings of the 32nd International Conference on Machine Learning. 2015, 1386−1394
|
[8] |
Kiryo R, Niu G, du Plessis M C, Sugiyama M. Positive-unlabeled learning with non-negative risk estimator. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 1674−1684
|
[9] |
Cour T, Sapp B, Taskar B. Learning from partial labels. The Journal of Machine Learning Research, 2011, 12: 1501–1536
|
[10] |
Xie M K, Huang S J. Partial multi-label learning. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 4302−4309
|
[11] |
Feng L, Lv J, Han B, Xu M, Niu G, Geng X, An B, Sugiyama M. Provably consistent partial-label learning. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 10948−10960
|
[12] |
Ishida T, Niu G, Hu W, Sugiyama M. Learning from complementary labels. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 5644–5654
|
[13] |
Yu X Y, Liu T L, Gong M M, Tao D C. Learning with biased complementary labels. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 69−85
|
[14] |
Bao H, Niu G, Sugiyama M. Classification from pairwise similarity and unlabeled data. In: Proceedings of the 35th International Conference on Machine Learning. 2018, 452−461
|
[15] |
Shimada T, Bao H, Sato I, Sugiyama M. Classification from pairwise similarities/dissimilarities and unlabeled data via empirical risk minimization. Neural Computation, 2021, 33( 5): 1234–1268
|
[16] |
Feng L, Shu S, Cao Y, Tao L, Wei H, Xiang T, An B, Niu G. Multiple-instance learning from similar and dissimilar bags. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021, 374−382
|
[17] |
Zhang D, Han J, Cheng G, Yang M H. Weakly supervised object localization and detection: a survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 9): 5866–5885
|
[18] |
Zhang D, Zeng W, Yao J, Han J. Weakly supervised object detection using proposal- and semantic-level relationships. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 6): 3349–3363
|
[19] |
Feng L, Shu S, Lu N, Han B, Xu M, Niu G, An B, Sugiyama M. Pointwise binary classification with pairwise confidence comparisons. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 3252−3262
|
[20] |
Zhou Z H, Chen K J, Jiang Y. Exploiting unlabeled data in content-based image retrieval. In: Proceedings of the 15th European Conference on Machine Learning. 2004, 525−536
|
[21] |
Cortes C, Mohri M. AUC optimization vs. error rate minimization. In: Proceedings of the 16th International Conference on Neural Information Processing Systems. 2003, 313−320
|
[22] |
Elkan C. The foundations of cost-sensitive learning. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence. 2001, 973−978
|
[23] |
Freund Y, Iyer R, Schapire R E, Singer Y. An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research, 2003, 4: 933–969
|
[24] |
Fawcett T. An introduction to ROC analysis. Pattern Recognition Letters, 2006, 27( 8): 861–874
|
[25] |
Zhou K, Gao S, Cheng J, Gu Z, Fu H, Tu Z, Yang J, Zhao Y, Liu J. Sparse-Gan: sparsity-constrained generative adversarial network for anomaly detection in retinal OCT image. In: Proceedings of the 17th IEEE International Symposium on Biomedical Imaging. 2020, 1227−1231
|
[26] |
Liu W, Luo W, Lian D, Gao S. Future frame prediction for anomaly detection–a new baseline. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6536−6545
|
[27] |
Liu C, Zhong Q, Ao X, Sun L, Lin W, Feng J, He Q, Tang J. Fraud transactions detection via behavior tree with local intention calibration. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020, 3035−3043
|
[28] |
Chen Y, Chen B, He X, Gao C, Li Y, Lou J G, Wang Y. λOpt: learn to regularize recommender models in finer levels. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019, 978−986
|
[29] |
Dai L, Yin Y, Qin C, Xu T, He X, Chen E, Xiong H. Enterprise cooperation and competition analysis with a sign-oriented preference network. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, 774−782
|
[30] |
Yang Z Y, Xu Q Q, Bao S, Bao S L, Cao X C, Huang Q M. Learning with Multiclass AUC: theory and algorithms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(11): 7747-7763
|
[31] |
Calders T, Jaroszewicz S. Efficient AUC optimization for classification. In: Proceedings of the 11th European Conference on Principles of Data Mining and Knowledge Discovery. 2007, 42−53
|
[32] |
Herschtal A, Raskutti B. Optimising area under the ROC curve using gradient descent. In: Proceedings of the 21st International Conference on Machine Learning. 2004, 49
|
[33] |
Joachims T. Training linear SVMs in linear time. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 217−226
|
[34] |
Zhao P, Hoi S C H, Jin R, Yang T. Online AUC maximization. In: Proceedings of the 28th International Conference on International Conference on Machine Learning. 2011, 233−240
|
[35] |
Gao W, Jin R, Zhu S, Zhou Z H. One-pass AUC optimization. In: Proceedings of the 30th International Conference on Machine Learning. 2013, III-906−III-914
|
[36] |
Ying Y, Wen L, Lyu S. Stochastic online AUC maximization. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 451−459
|
[37] |
Dang Z, Li X, Gu B, Deng C, Huang H. Large-scale nonlinear AUC maximization via triply stochastic gradients. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44( 3): 1385–1398
|
[38] |
Agarwal S, Graepel T, Herbrich R, Har-Peled S, Roth D. Generalization bounds for the area under the ROC curve. The Journal of Machine Learning Research, 2005, 6: 393–425
|
[39] |
Usunier N, Amini M R, Gallinari P. A data-dependent generalisation error bound for the AUC. In: Proceedings of the ICML 2005 Workshop on ROC Analysis in Machine Learning. 2005
|
[40] |
Agarwal S. Surrogate regret bounds for bipartite ranking via strongly proper losses. The Journal of Machine Learning Research, 2014, 15( 1): 1653–1674
|
[41] |
Gao W, Zhou Z H. On the consistency of AUC pairwise optimization. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 939−945
|
[42] |
Elkan C, Noto K. Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 213−220
|
[43] |
Niu G, du Plessis, Sakai T, Ma Y, Sugiyama M. Theoretical comparisons of positive-unlabeled learning against positive-negative learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. 2016, 1207−1215
|
[44] |
Ren K, Yang H, Zhao Y, Chen W, Xue M, Miao H, Huang S, Liu J. A robust AUC maximization framework with simultaneous outlier detection and feature selection for positive-unlabeled classification. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30( 10): 3072–3083
|
[45] |
Lu N, Niu G, Menon A K, Sugiyama M. On the minimal supervision for training any binary classifier from only unlabeled data. In: Proceedings of the 7th International Conference on Learning Representations. 2019
|
[46] |
Brefeld U, Scheffer T. AUC maximizing support vector learning. In: Proceedings of ICML 2005 workshop on ROC Analysis in Machine Learning. 2005
|
[47] |
LeCun Y, Bottou L, Bengio Y, Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86( 11): 2278–2324
|
[48] |
Xiao H, Rasul K, Vollgraf R. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. 2017, arXiv preprint arXiv: 1708.07747
|
[49] |
Clanuwat T, Bober-Irizar M, Kitamoto A, Lamb A, Yamamoto K, Ha D. Deep learning for classical Japanese literature. 2018, arXiv preprint arXiv: 1812.01718
|
[50] |
Hull J J. A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1994, 16( 5): 550–554
|
[51] |
Dua D, Graff C. UCI Machine Learning Repository. Irvine: University of California, School of Information and Computer Science. See archive.ics.uci.edu/ml/citation_policy.html website, 2019
|
[52] |
Kingma D P, Ba J. Adam: a method for stochastic optimization. 2014, arXiv preprint arXiv: 1412.6980
|
[53] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 3rd International Conference on Learning Representations. 2015
|
[54] |
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A, Köpf A, Yang E Z, DeVito Z, Raison M, Tejani A, Chilamkurthy S, Steiner B, Fang L, Bai J, Chintala S. PyTorch: an imperative style, high-performance deep learning library. In: Proceedings of the 33rd Conference on Neural Information Processing Systems. 2019, 8024−8035
|
[55] |
Zhang T. Statistical behavior and consistency of classification methods based on convex risk minimization. The Annals of Statistics, 2004, 32( 1): 56–85
|
[56] |
Mohri M, Rostamizadeh A, Talwalkar A. Foundations of Machine Learning. 2nd ed. MIT Press, 2018
|
/
〈 | 〉 |