Binary relevance for multi-label learning: an overview
Min-Ling ZHANG, Yu-Kun LI, Xu-Ying LIU, Xin GENG
Binary relevance for multi-label learning: an overview
Multi-label learning deals with problems where each example is represented by a single instance while being associated with multiple class labels simultaneously. Binary relevance is arguably the most intuitive solution for learning from multi-label examples. It works by decomposing the multi-label learning task into a number of independent binary learning tasks (one per class label). In view of its potential weakness in ignoring correlations between labels, many correlation-enabling extensions to binary relevance have been proposed in the past decade. In this paper, we aim to review the state of the art of binary relevance from three perspectives. First, basic settings for multi-label learning and binary relevance solutions are briefly summarized. Second, representative strategies to provide binary relevancewith label correlation exploitation abilities are discussed. Third, some of our recent studies on binary relevance aimed at issues other than label correlation exploitation are introduced. As a conclusion, we provide suggestions on future research directions.
machine learning / multi-label learning / binary relevance / label correlation / class-imbalance / relative labeling-importance
[1] |
Zhang M-L, Zhou Z-H. A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819–1837
CrossRef
Google scholar
|
[2] |
Zhou Z-H, Zhang M-L. Multi-label learning. In: Sammut C, Webb G I, eds. Encyclopedia of Machine Learning and Data Mining. Berlin: Springer, 2016, 1–8
CrossRef
Google scholar
|
[3] |
Schapire R E, Singer Y. Boostexter: a boosting-based system for text categorization. Machine Learning, 2000, 39(2–3): 135–168
CrossRef
Google scholar
|
[4] |
Cabral R S, De la Torre F, Costeira J P, Bernardino A. Matrix completion for multi-label image classification. In: Proceedings of Advances in Neural Information Processing Systems. 2011, 190–198
|
[5] |
Sanden C, Zhang J Z. Enhancing multi-label music genre classification through ensemble techniques. In: Proceedings of the 34th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2011, 705–714
CrossRef
Google scholar
|
[6] |
Barutcuoglu Z, Schapire R E, Troyanskaya O G. Hierarchical multilabel prediction of gene function. Bioinformatics, 2006, 22(7): 830–836
CrossRef
Google scholar
|
[7] |
Qi G-J, Hua X-S, Rui Y, Tang J, Mei T, Zhang H-J. Correlative multilabel video annotation. In: Proceedings of the 15th ACM International Conference on Multimedia. 2007, 17–26
|
[8] |
Tang L, Rajan S, Narayanan V K. Large scale multi-label classification via metalabeler. In: Proceedings of the 19th International Conference on World Wide Web. 2009, 211–220
CrossRef
Google scholar
|
[9] |
Boutell M R, Luo J, Shen X, Brown C M. Learning multi-label scene classification. Pattern Recognition, 2004, 37(9): 1757–1771
CrossRef
Google scholar
|
[10] |
Tsoumakas G, Katakis I, Vlahavas I. Mining multi-label data. In: Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook. Berlin: Springer, 2010, 667–686
|
[11] |
Gibaja E, Ventura S. A tutorial on multilabel learning. ACM Computing Surveys, 2015, 47(3): 52
CrossRef
Google scholar
|
[12] |
Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multilabel classification. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2009, 254–269
CrossRef
Google scholar
|
[13] |
Dembczyński K, Cheng W, Hüllermeier E. Bayes optimal multilabel classification via probabilistic classifier chains. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 279–286
|
[14] |
Read J, Pfahringer B, Holmes G, Frank E. Classifier chains for multilabel classification. Machine Learning, 2011, 85(3): 333–359
CrossRef
Google scholar
|
[15] |
Kumar A, Vembu S, Menon A K, Elkan C. Learning and inference in probabilistic classifier chains with beam search. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2012, 665–680
CrossRef
Google scholar
|
[16] |
Li N, Zhou Z-H. Selective ensemble of classifier chains. In: Proceedings of International Workshop on Multiple Classifier Systems. 2013, 146–156
CrossRef
Google scholar
|
[17] |
Senge R, del Coz J J, Hüllermeier E. Rectifying classifier chains for multi-label classification. In: Proceedings of the 15th German Workshop on Learning, Knowledge, and Adaptation. 2013, 162–169
|
[18] |
Mena D, Montañés E, Quevedo J R, del Coz J J. A family of admissible heuristics for A* to perform inference in probabilistic classifier chains. Machine Learning, 2017, 106(1): 143–169
CrossRef
Google scholar
|
[19] |
Godbole S, Sarawagi S. Discriminative methods for multi-labeled classification. In: Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. 20004, 22–30
CrossRef
Google scholar
|
[20] |
Montañés E, Quevedo J R, del Coz J J. Aggregating independent and dependent models to learn multi-label classifiers. In: proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011, 484–500
CrossRef
Google scholar
|
[21] |
Montañés E, Senge R, Barranquero J, Quevedo J R, del Coz J J, Hüllermeier E. Dependent binary relevance models for multi-label classification. Pattern Recognition, 2014, 47(3): 1494–1508
CrossRef
Google scholar
|
[22] |
Tahir M A, Kittler J, Bouridane A. Multi-label classification using stacked spectral kernel discriminant analysis. Neurocomputing, 2016, 171: 127–137
CrossRef
Google scholar
|
[23] |
LozaMencía E, Janssen F. Learning rules for multi-label classification: a stacking and a separate-and-conquer approach. Machine Learning, 2016, 105(1): 77–126
CrossRef
Google scholar
|
[24] |
Tsoumakas G, Dimou A, Spyromitros E, Mezaris V, Kompatsiaris I, Vlahavas I. Correlation-based pruning of stacked binary relevance models for multi-label learning. In: Proceedings of the 1st International Workshop on Learning from Multi-Label Data. 2009, 101–116
|
[25] |
Zhang M-L, Zhang K. Multi-label learning by exploiting label dependency. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 999–1007
CrossRef
Google scholar
|
[26] |
Alessandro A, Corani G, Mauá D, Gabaglio S. An ensemble of Bayesian networks for multilabel classification. In: Proceedings of the 23rd International Joint Conference on Artificial Intelligence. 2013, 1220–1225
|
[27] |
Sucar L E, Bielza C, Morales E F, Hernandez-Leal P, Zaragoza J H, Larrañaga P. Multi-label classification with Bayesian network-based chain classifiers. Pattern Recognition Letters, 2014, 41: 14–22
CrossRef
Google scholar
|
[28] |
Li Y-K, Zhang M-L. Enhancing binary relevance for multi-label learning with controlled label correlations exploitation. In: Proceedings of Pacific Rim International Conference on Artificial Intelligence. 2014, 91–103
CrossRef
Google scholar
|
[29] |
Alali A, Kubat M. Prudent: a pruned and confident stacking approach for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(9): 2480–2493
CrossRef
Google scholar
|
[30] |
Petterson J, Caetano T. Reverse multi-label learning. In: Proceedings of the Neural Information Processing Systems Comference. 2010, 1912–1920
|
[31] |
Spyromitros-Xioufis E, Spiliopoulou M, Tsoumakas G, Vlahavas I. Dealing with concept drift and class imbalance in multi-label stream classification. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 1583–1588
|
[32] |
Tahir M A, Kittler J, Yan F. Inverse random under sampling for class imbalance problem and its application to multi-label classification. Pattern Recognition, 2012, 45(10): 3738–3750
CrossRef
Google scholar
|
[33] |
Quevedo J R, Luaces O, Bahamonde A. Multilabel classifiers with a probabilistic thresholding strategy. Pattern Recognition, 2012, 45(2): 876–883
|
[34] |
Pillai I, Fumera G, Roli F. Threshold optimisation for multi-label classifiers. Pattern Recognition, 2013, 46(7): 2055–2065
CrossRef
Google scholar
|
[35] |
Dembczynski K, Jachnik A, Kotłowski W, Waegeman W, Hüllermeier E. Optimizing the F-measure in multi-label classification: plug-in rule approach versus structured loss minimization. In: Proceedings of the 30th International Conference on Machine Learning. 2013, 1130–1138
|
[36] |
Charte F, Rivera A J, del Jesus M J, Herrera F. Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing, 2015, 163: 3–16
CrossRef
Google scholar
|
[37] |
Charte F, Rivera A J, del Jesus M J, Herrera F. Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowledge-Based Systems, 2015, 89: 385–397
CrossRef
Google scholar
|
[38] |
Zhang M-L, Li Y-K, Liu X-Y. Towards class-imbalance aware multilabel learning. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence. 2015, 4041–4047
|
[39] |
Wu B, Lyu S, Ghanem B. Constrained submodular minimization for missing labels and class imbalance in multi-label learning. In: Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016, 2229–2236
|
[40] |
Cheng W, Dembczynski K J, Hüllermeier E. Graded multilabel classification: the ordinal case. In: Proceedings of the 27th International Conference on Machine Learning. 2010, 223–230
|
[41] |
Xu M, Li Y-F, Zhou Z-H. Multi-label learning with PRO loss. In: Proceedings of the 27th AAAI Conference on Artificial Intelligence. 2013, 998–1004
|
[42] |
Li Y-K, Zhang M-L, Geng X. Leveraging implicit relative labelingimportance information for effective multi-label learning. In: Proceedings of the 15th IEEE International Conference on Data Mining. 2015, 251–260
|
[43] |
Geng X, Yin C, Zhou Z-H. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2401–2412
CrossRef
Google scholar
|
[44] |
Geng X. Label distribution learning. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1734–1748
CrossRef
Google scholar
|
[45] |
Gao N, Huang S-J, Chen S. Multi-label active learning by model guided distribution matching. Frontiers of Computer Science, 2016, 10(5): 845–855
CrossRef
Google scholar
|
[46] |
Dembczyński K, Waegeman W, Cheng W, Hüllermeier E. On label dependence and loss minimization in multi-label classification. Machine Learning, 2012, 88(1–2): 5–45
CrossRef
Google scholar
|
[47] |
Gao W, Zhou Z-H. On the consistency of multi-label learning. In: Proceedings of the 24th Annual Conference on Learning Theory. 2011, 341–358
|
[48] |
Sun Y-Y, Zhang Y, Zhou Z-H. Multi-label learning with weak label. In: Proceedings of the 24th AAAI Conference on Artificial Intelligence. 2010, 593–598
|
[49] |
Xu M, Jin R, Zhou Z-H. Speedup matrix completion with side information: application to multi-label learning. In: Proceedings of the Neural Information Processing Systems Conference. 2013, 2301–2309
|
[50] |
Cabral R, De la Torre F, Costeira J P, Bernardino A. Matrix completion for weakly-supervised multi-label image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 121–135
CrossRef
Google scholar
|
[51] |
Senge R, del Coz J J, Hüllermeier E. On the problem of error propagation in classifier chains for multi-label classification. In: Spiliopoulou M, Schmidt-Thieme L, Janning R, eds. Data Analysis, Machine Learning and Knowledge Discovery. Berlin: Springer, 2014. 163–170
CrossRef
Google scholar
|
[52] |
Zhou Z-H. Ensemble Methods: Foundations and Algorithms. Boca Raton, FL: Chap-man & Hall/CRC, 2012
|
[53] |
Koller D, Friedman N. Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press, 2009
|
[54] |
Koivisto M. Advances in exact Bayesian structure discovery in Bayesian networks. In: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence. 2006, 241–248
|
[55] |
Smith V, Yu J, Smulders T, Hartemink A, Jarvis E. Computational inference of neural information flow networks. PLoS Computational Biology, 2006, 2: 1436–1449
CrossRef
Google scholar
|
[56] |
Murphy K. Software for graphical models: a review. ISBA Bulletin, 2007, 14(4): 13–15
|
[57] |
Tsoumakas G, Spyromitros-Xioufis E, Vilcek J, Vlahavas I. MULAN: a Java library for multi-label learning. Journal of Machine Learning Research, 2011, 12: 2411–2414
|
[58] |
He H, Garcia E A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263–1284
CrossRef
Google scholar
|
[59] |
Wang S, Yao X. Multiclass imbalance problems: analysis and potential solutions. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, 2012, 42(4): 1119–1130
CrossRef
Google scholar
|
[60] |
Liu X-Y, Li Q-Q, Zhou Z-H. Learning imbalanced multi-class data with optimal dichotomy weights. In: Proceedings of the 13th IEEE International Conference on Data Mining. 2013, 478–487
CrossRef
Google scholar
|
[61] |
Abdi L, Hashemi S. To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(1): 238–251
CrossRef
Google scholar
|
[62] |
Zhou D, Bousquet O, Lal T N, Weston J, Schölkopf B. Learning with local and global consistency. In: Proceedings of the Neural Information Processing Systems Conference. 2004, 284–291
|
[63] |
Zhu X, Goldberg A B. Introduction to semi-supervised learning. In: Brachman R, Stone P, eds. Synthesis Lectures to Artificial Intelligence and Machine Learning. San Francisco, CA: Morgan & Claypool Publishers, 2009, 1–130
CrossRef
Google scholar
|
[64] |
Della Pietra S, Della Pietra V, Lafferty J. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1997, 19(4): 380–393
CrossRef
Google scholar
|
[65] |
Zhang M-L, Wu L. LIFT: multi-label learning with label-specific features. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 107–120
CrossRef
Google scholar
|
[66] |
Xu X, Yang X, Yu H, Yu D-J, Yang J, Tsang E C C. Multi-label learning with label-specific feature reduction. Knowledge-Based Systems, 2016, 104: 52–61
CrossRef
Google scholar
|
[67] |
Huang J, Li G, Huang Q, Wu X. Learning label-specific features and class-dependent labels for multi-label classification. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(12): 3309–3323
CrossRef
Google scholar
|
[68] |
Weston J, Bengio S, Usunier N. WSABIE: scaling up to large vocabulary image annotation. In: Proceedings of the 22nd International Joint Conference on Artificial Intelligence. 2011, 2764–2770
|
[69] |
Agrawal R, Gupta A, Prabhu Y, Varma M. Multi-label learning with millions of labels: recommending advertiser bid phrases for Web pages. In: Proceedings of the 22nd International Conference on World Wide Web. 2013, 13–24
CrossRef
Google scholar
|
[70] |
Xu C, Tao D, Xu C. Robust extreme multi-label learning. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1275–1284
CrossRef
Google scholar
|
[71] |
Jain H, Prabhu Y, Varma M. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 935–944
CrossRef
Google scholar
|
[72] |
Zhou W J, Yu Y, Zhang M-L. Binary linear compression for multi-label classification. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017
CrossRef
Google scholar
|
/
〈 | 〉 |