Interpretation with baseline shapley value for feature groups on tree models
Fan XU, Zhi-Jian ZHOU, Jie NI, Wei GAO
Interpretation with baseline shapley value for feature groups on tree models
Tree models have made an impressive progress during the past years, while an important problem is to understand how these models predict, in particular for critical applications such as finance and medicine. For this issue, most previous works measured the importance of individual features. In this work, we consider the interpretation of feature groups, which is more effective to capture intrinsic structures and correlations of multiple features. We propose the Baseline Group Shapley value (short for BGShapvalue) to calculate the importance of a feature group for tree models. We further develop a polynomial algorithm, BGShapTree, to deal with the sum of exponential terms in the BGShapvalue. The basic idea is to decompose the BGShapvalue into leaves’ weights and exploit the relationships between features and leaves. Based on this idea, we could greedily search salient feature groups with large BGShapvalues. Extensive experiments have validated the effectiveness of our approach, in comparison with state-of-the-art methods on the interpretation of tree models.
interpretability / shapley value / random forests / decision tree
Fan Xu received his BSc degree from Southeast University, China in 2020. Currently, he is working towards the PhD degree in Nanjing University, China. His research interest is mainly on machine learning
Zhi-Jian Zhou received his BSc degree from Dalian University of Technology, China in 2021. He is now a graduate student in Nanjing University, China. His research interest is mainly on hypothesis testing
Jie Ni received his BSc degree from Nanjing University, China in 2021. Currently, he is a graduate student in Nanjing University, China. His research interest include machine learning and data mining
Wei Gao received his PhD degree from Nanjing University, China in 2014, and he is currently an associate professor of School of Artificial Intelligence in Nanjing University, China. His research interests include learning theory. His works have been published in top-tier international journals or conference proceedings such as AIJ, IEEE TPAMI, COLT, ICML and NeurIPS. He is also a co-author of the book Introduction to the Theory of Machine Learning
[1] |
Breiman L . Random forests. Machine Learning, 2001, 45( 1): 5–32
|
[2] |
Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 785−794
|
[3] |
Zhou Z H, Feng J. Deep forest: towards an alternative to deep neural networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017, 3553−3559
|
[4] |
Ribeiro M T, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 1135−1144
|
[5] |
Grath R M, Costabello L, Le Van C, Sweeney P, Kamiab F, Shen Z, Lecue F. Interpretable credit application predictions with counterfactual explanations. 2018, arXiv preprint arXiv: 1811.05245
|
[6] |
Lundberg S M, Nair B, Vavilala M S, Horibe M, Eisses M J, Adams T, Liston D E, Low D K W, Newman S F, Kim J, Lee S I . Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nature Biomedical Engineering, 2018, 2( 10): 749–760
|
[7] |
Tjoa E, Guan C . A survey on explainable artificial intelligence (XAI): toward medical XAI. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32( 11): 4793–4813
|
[8] |
Zablocki É, Ben-Younes H, Pérez P, Cord M . Explainability of deep vision-based autonomous driving systems: review and challenges. International Journal of Computer Vision, 2022, 130( 10): 2425–2452
|
[9] |
Breiman L, Friedman J, Olshen R A, Stone C J. Classification and Regression Trees. New York: CRC Press, 1984
|
[10] |
Strobl C, Boulesteix A L, Zeileis A, Hothorn T . Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics, 2007, 8: 25
|
[11] |
Louppe G, Wehenkel L, Sutera A, Geurts P. Understanding variable importances in forests of randomized trees. In: Proceedings of the 26th International Conference on Neural Information Processing Systems. 2013, 431−439
|
[12] |
Saabas A. Interpreting random forests. See interpreting-random-forests/ website, 2014
|
[13] |
Kazemitabar S J, Amini A A, Bloniarz A, Talwalkar A. Variable importance using decision trees. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 425−434
|
[14] |
Li X, Wang Y, Basu S, Kumbier K, Yu B. A debiased MDI feature importance measure for random forests. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019, 723
|
[15] |
Shapley L S. A value for n-person games. In: Kuhn H W, Tucker A W, eds. Contributions to the Theory of Games. Princeton: Princeton University Press, 1953, 307−317
|
[16] |
Lundberg S M, Erion G, Chen H, DeGrave A, Prutkin J M, Nair B, Katz R, Himmelfarb J, Bansal N, Lee S I . From local explanations to global understanding with explainable AI for trees. Nature Machine Intelligence, 2020, 2( 1): 56–67
|
[17] |
Athanasiou M, Sfrintzeri K, Zarkogianni K, Thanopoulou A C, Nikita K S. An explainable XGBoost–based approach towards assessing the risk of cardiovascular disease in patients with type 2 diabetes mellitus. In: Proceedings of the 20th IEEE International Conference on Bioinformatics and Bioengineering. 2020, 859−864
|
[18] |
Feng D C, Wang W J, Mangalathu S, Taciroglu E . Interpretable XGBoost-SHAP machine-learning model for shear strength prediction of squat RC walls. Journal of Structural Engineering, 2021, 147( 11): 04021173
|
[19] |
Sutera A, Louppe G, Huynh-Thu V A, Wehenkel L, Geurts P. From global to local MDI variable importances for random forests and when they are Shapley values. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 3533−3543
|
[20] |
Amoukou S I, Salaün T, Brunel N J B. Accurate Shapley values for explaining tree-based models. In: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. 2022, 2448−2465
|
[21] |
Sundararajan M, Najmi A. The many Shapley values for model explanation. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 859
|
[22] |
Lundberg S M, Lee S I. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017, 4768−4777
|
[23] |
Marichal J L . The influence of variables on pseudo-Boolean functions with applications to game theory and multicriteria decision making. Discrete Applied Mathematics, 2000, 107( 1-3): 139–164
|
[24] |
Flores R, Molina E, Tejada J . Evaluating groups with the generalized Shapley value. 4OR, 2019, 17( 2): 141–172
|
[25] |
Marichal J L, Kojadinovic I, Fujimoto K . Axiomatic characterizations of generalized values. Discrete Applied Mathematics, 2007, 155( 1): 26–43
|
[26] |
Sundararajan M, Taly A, Yan Q. Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning. 2017, 3319−3328
|
[27] |
Štrumbelj E, Kononenko I . Explaining prediction models and individual predictions with feature contributions. Knowledge and Information Systems, 2014, 41( 3): 647–665
|
[28] |
Datta A, Sen S, Zick Y. Algorithmic transparency via quantitative input influence: theory and experiments with learning systems. In: Proceedings of 2016 IEEE Symposium on Security and Privacy. 2016, 598−617
|
[29] |
Díaz-Uriarte R, de Andrés S A . Gene selection and classification of microarray data using random forest. BMC Bioinformatics, 2006, 7: 3
|
[30] |
Ishwaran H . Variable importance in binary regression trees and forests. Electronic Journal of Statistics, 2007, 1: 519–537
|
[31] |
Archer K J, Kimes R V . Empirical characterization of random forest variable importance measures. Computational Statistics & Data Analysis, 2008, 52( 4): 2249–2260
|
[32] |
Strobl C, Boulesteix A L, Kneib T, Augustin T, Zeileis A . Conditional variable importance for random forests. BMC Bioinformatics, 2008, 9: 307
|
[33] |
Auret L, Aldrich C . Empirical comparison of tree ensemble variable importance measures. Chemometrics and Intelligent Laboratory Systems, 2011, 105( 2): 157–170
|
[34] |
Louppe G. Understanding random forests: from theory to practice. 2014, arXiv preprint arXiv: 1407.7502
|
[35] |
Nembrini S, König I R, Wright M N . The revival of the Gini importance?. Bioinformatics, 2018, 34( 21): 3711–3718
|
[36] |
Scornet E. Trees, forests, and impurity-based variable importance. 2020, arXiv preprint arXiv: 2001.04295
|
[37] |
Sagi O, Rokach L . Explainable decision forest: transforming a decision forest into an interpretable tree. Information Fusion, 2020, 61: 124–138
|
[38] |
Tan S, Soloviev M, Hooker G, Wells M T. Tree space prototypes: another look at making tree ensembles interpretable. In: Proceedings of 2020 ACM-IMS on Foundations of Data Science Conference. 2020, 23−34
|
[39] |
Lucic A, Oosterhuis H, Haned H, de Rijke M. FOCUS: flexible optimizable counterfactual explanations for tree ensembles. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 5313−5322
|
[40] |
Parmentier A, Vidal T. Optimal counterfactual explanations in tree ensembles. In: Proceedings of the 38th International Conference on Machine Learning. 2021, 8422−8431
|
[41] |
Dutta S, Long J, Mishra S, Tilli C, Magazzeni D. Robust counterfactual explanations for tree-based ensembles. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 5742−5756
|
[42] |
Ignatiev A. Towards trustable explainable AI. In: Proceedings of the 29th International Joint Conference on Artificial Intelligence. 2020, 5154−5158
|
[43] |
Izza Y, Ignatiev A, Marques-Silva J. On explaining decision trees. 2020, arXiv preprint arXiv: 2010.11034
|
[44] |
Izza Y, Marques-Silva J. On explaining random forests with SAT. In: Proceedings of the 30th International Joint Conference on Artificial Intelligence. 2021, 2584−2591
|
[45] |
Ignatiev A, Izza Y, Stuckey P J, Marques-Silva J. Using MaxSAT for efficient explanations of tree ensembles. In: Proceedings of the 36th AAAI Conference on Artificial Intelligence. 2022, 3776−3785
|
[46] |
Agarwal A, Tan Y S, Ronen O, Singh C, Yu B. Hierarchical shrinkage: improving the accuracy and interpretability of tree-based models. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 111−135
|
[47] |
Yang J. Fast TreeSHAP: accelerating SHAP value computation for trees. 2021, arXiv preprint arXiv: 2109.09847
|
[48] |
Grömping U . Estimators of relative importance in linear regression based on variance decomposition. The American Statistician, 2007, 61( 2): 139–147
|
[49] |
Sun Y, Sundararajan M. Axiomatic attribution for multilinear functions. In: Proceedings of the 12th ACM Conference on Electronic Commerce. 2011, 177−178
|
[50] |
Aas K, Jullum M, Løland A . Explaining individual predictions when features are dependent: more accurate approximations to Shapley values. Artificial Intelligence, 2021, 298: 103502
|
[51] |
Chau S L, Hu R, Gonzalez J, Sejdinovic D. RKHS-SHAP: Shapley values for kernel methods. In: Proceedings of the 36th International Conference on Neural Information Processing Systems. 2022, 13050−13063
|
[52] |
Ancona M, Oztireli C, Gross M. Explaining deep neural networks with a polynomial time algorithm for Shapley value approximation. In: Proceedings of the 36th International Conference on Machine Learning. 2019, 272−281
|
[53] |
Ghorbani A, Zou J. Neuron Shapley: discovering the responsible neurons. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 5922−5932
|
[54] |
Bento J, Saleiro P, Cruz A F, Figueiredo M A T, Bizarro P. TimeSHAP: explaining recurrent models through sequence perturbations. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2021, 2565−2573
|
[55] |
Wang G, Chuang Y N, Du M, Yang F, Zhou Q, Tripathi P, Cai X, Hu X. Accelerating Shapley explanation via contributive cooperator selection. In: Proceedings of the 39th International Conference on Machine Learning. 2022, 22576−22590
|
[56] |
Chen L, Lou S, Zhang K, Huang J, Zhang Q. HarsanyiNet: computing accurate Shapley values in a single forward propagation. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 4804−4825
|
[57] |
Štrumbelj E, Kononenko I, Šikonja M R . Explaining instance classifications with interactions of subsets of feature values. Data & Knowledge Engineering, 2009, 68( 10): 886–904
|
[58] |
Owen A B . Sobol’ indices and Shapley value. SIAM/ASA Journal on Uncertainty Quantification, 2014, 2( 1): 245–251
|
[59] |
Owen A B, Prieur C . On Shapley value for measuring importance of dependent inputs. SIAM/ASA Journal on Uncertainty Quantification, 2017, 5( 1): 986–1002
|
[60] |
Frye C, Rowat C, Feige I. Asymmetric Shapley values: incorporating causal knowledge into model-agnostic explainability. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 1229−1239
|
[61] |
Heskes T, Sijben E, Bucur I G, Claassen T. Causal Shapley values: exploiting causal knowledge to explain individual predictions of complex models. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020, 4778−4789
|
[62] |
Dhamdhere K, Agarwal A, Sundararajan M. The Shapley Taylor interaction index. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 9259−9268
|
[63] |
Covert I, Lee S I. Improving KernelSHAP: practical Shapley value estimation using linear regression. In: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics. 2021, 3457−3465
|
[64] |
Janizek J D, Sturmfels P, Lee S I . Explaining explanations: axiomatic feature interactions for deep networks. The Journal of Machine Learning Research, 2021, 22( 1): 104
|
[65] |
Wang J, Zhang Y, Gu Y, Kim T K. SHAQ: incorporating Shapley value theory into multi-agent Q-learning. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 5941−5954
|
[66] |
Beechey D, Smith T M S, Şimşek Ö. Explaining reinforcement learning with Shapley values. In: Proceedings of the 40th International Conference on Machine Learning. 2023, 2003−2014
|
[67] |
Ren J, Zhang D, Wang Y, Chen L, Zhou Z, Chen Y, Cheng X, Wang X, Zhou M, Shi J, Zhang Q. Towards a unified game-theoretic view of adversarial perturbations and robustness. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 3797−3810
|
[68] |
Chau S L, Muandet K, Sejdinovic D. Explaining the uncertain: stochastic Shapley values for Gaussian process models. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 50769−50795
|
[69] |
Watson D S, O’Hara J, Tax N, Mudd R, Guy I. Explaining predictive uncertainty with information theoretic Shapley values. In: Proceedings of the 37th Conference on Neural Information Processing Systems. 2023, 7330−7350
|
[70] |
Janzing D, Minorics L, Blöbaum P. Feature relevance quantification in explainable AI: a causal problem. In: Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics. 2020, 2907−2916
|
[71] |
Kumar I E, Venkatasubramanian S, Scheidegger C, Friedler S A. Problems with Shapley-value-based explanations as feature importance measures. In: Proceedings of the 37th International Conference on Machine Learning. 2020, 5491−5500
|
[72] |
Kumar I E, Scheidegger C, Venkatasubramanian S, Friedler S A. Shapley residuals: quantifying the limits of the Shapley value for explanations. In: Proceedings of the 35th Conference on Neural Information Processing Systems. 2021, 26598−26608
|
[73] |
Kwon Y, Zou J. WeightedSHAP: analyzing and improving Shapley based feature attributions. In: Proceedings of the 36th Conference on Neural Information Processing Systems. 2022, 34363−34376
|
[74] |
Van den Broeck G, Lykov A, Schleich M, Suciu D. On the tractability of SHAP explanations. In: Proceedings of the 35th AAAI Conference on Artificial Intelligence. 2021, 6505−6513
|
[75] |
Bordt S, von Luxburg U. From Shapley values to generalized additive models and back. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics. 2023, 709−745
|
[76] |
Jullum M, Redelmeier A, Aas K. groupShapley: efficient prediction explanation with Shapley values for feature groups. 2021, arXiv preprint arXiv: 2106.12228
|
[77] |
Miroshnikov A, Kotsiopoulos K, Filom K, Kannan A R. Stability theory of game-theoretic group feature explanations for machine learning models. 2021, arXiv preprint arXiv: 2102.10878
|
[78] |
Au Q, Herbinger J, Stachl C, Bischl B, Casalicchio G . Grouped feature importance and combined features effect plot. Data Mining and Knowledge Discovery, 2022, 36( 4): 1401–1450
|
[79] |
Vanschoren J, van Rijn J N, Bischl B, Torgo L . OpenML: networked science in machine learning. ACM SIGKDD Explorations Newsletter, 2014, 15( 2): 49–60
|
[80] |
Kelly M, Longjohn R, Nottingham K. The UCI Machine Learning Repository. See archive.ics.uci.edu website. 2024
|
[81] |
Samek W, Binder A, Montavon G, Lapuschkin S, Müller K R . Evaluating the visualization of what a deep neural network has learned. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28( 11): 2660–2673
|
[82] |
Lundberg S M, Erion G G, Lee S I. Consistent individualized feature attribution for tree ensembles. 2018, arXiv preprint arXiv: 1802.03888
|
/
〈 | 〉 |