Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Farid FEYZI , Saeed PARSA

Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (4) : 735 -759.

PDF (1378KB)
Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (4) : 735 -759. DOI: 10.1007/s11704-017-6512-z
RESEARCH ARTICLE

Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference

Author information +
History +
PDF (1378KB)

Abstract

In this paper, a novel approach, Inforence, is proposed to isolate the suspicious codes that likely contain faults. Inforence employs a feature selection method, based on mutual information, to identify those bug-related statements that may cause the program to fail. Because the majority of a program faults may be revealed as undesired joint effect of the program statements on each other and on program termination state, unlike the state-of-the-art methods, Inforence tries to identify and select groups of interdependent statements which altogether may affect the program failure. The interdependence amongst the statements is measured according to their mutual effect on each other and on the program termination state. To provide the context of failure, the selected bug-related statements are chained to each other, considering the program static structure. Eventually, the resultant causeeffect chains are ranked according to their combined causal effect on program failure. To validate Inforence, the results of our experimentswith seven sets of programs include Siemens suite, gzip, grep, sed, space, make and bash are presented. The experimental results are then compared with those provided by different fault localization techniques for the both single-fault and multi-fault programs. The experimental results prove the outperformance of the proposed method compared to the state-of-the-art techniques.

Keywords

fault localization / debugging / backward dynamic slice / mutual information / feature selection

Cite this article

Download citation ▾
Farid FEYZI, Saeed PARSA. Inforence: effective fault localization based on information-theoretic analysis and statistical causal inference. Front. Comput. Sci., 2019, 13(4): 735-759 DOI:10.1007/s11704-017-6512-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Parsa S, Vahidi-Asl M, Asadi-Aghbolaghi M. Hierarchy-Debug: a scalable statistical technique for fault localization. Software Quality Journal, 2014, 22(3): 427–466

[2]

Wong W E, Debroy V, Gao R, Li Y. The DStar method for effective software fault localization. IEEE Transactions on Reliability, 2014, 63(1): 290–308

[3]

Wong W E, Debroy V, Xu D. Towards better fault localization: a crosstab-based statistical approach. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012, 42(3): 378–396

[4]

Wong W E, Debroy V, Choi B. A family of code coverage-based heuristics for effective fault localization. Journal of Systems and Software, 2010, 83(2): 188–208

[5]

Wong W E, Qi Y. BP neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 2009, 19(4): 573–597

[6]

Wong W E, Debroy V, Golden R, Xu X, Thuraisingham B. Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 2012, 61(1): 149–169

[7]

Abreu R, Zoeteweij P, Van Gemund A J. An evaluation of similarity coefficients for software fault localization. In: Proceedings of the International Symposium on Dependable Computing. 2006, 39–46

[8]

Naish L, Lee H, Ramamohanarao K. A model for spectra-based software diagnosis. ACM Transactions on Software Engineering and Methodology (TOSEM), 2011, 20(3): 11

[9]

Roychowdhury S, Khurshid S. Software fault localization using feature selection. In: Proceedings of the International Workshop on Machine Learning Technologies in Software Engineering. 2011, 11–18

[10]

Roychowdhury S, Khurshid S. A family of generalized entropies and its application to software fault localization. In: Proceedings of the 6th IEEE International Conference on Intelligent Systems. 2012, 368–373

[11]

Jiang L, Su Z. Context-aware statistical debugging: from bug predictors to faulty control flow paths. In: Proceedings of the 22nd IEEE/ACM International Conference on Automated Software Engineering. 2007, 184–193

[12]

Liu H, Sun J, Liu L, Zhang H. Feature selection with dynamic mutual information. Pattern Recognition, 2009, 42(7): 1330–1339

[13]

Sun X, Liu Y, Xu M, Chen H, Han J, Wang K. Feature selection using dynamic weights for classification. Knowledge-Based Systems, 2013, 37: 541–549

[14]

Estévez P A, Tesmer M, Perez C A, Zurada J M. Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 2009, 20(2): 189–201

[15]

Baah G K, Podgurski A, Harrold M J. Causal inference for statistical fault localization. In: Proceedings of the 19th International Symposium on Software Testing and Analysis. 2010, 73–84

[16]

Baah G K, Podgurski A, Harrold M J. Mitigating the confounding effects of program dependences for effective fault localization. In: Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. 2011, 146–156

[17]

Baah G K, Podgurski A, Harrold M J. Matching Test Cases for Effective Fault Localization. Georgia Institute of Technology, 2011

[18]

Shu G. Statistical estimation of software reliability and failure-causing effect. Doctoral Dissertation, Case Western Reserve University, 2014

[19]

Weiser M. Program slicing. In: Proceedings of the 5th International Conference on Software Engineering. 1981, 439–449

[20]

Denmat T, Ducassé M, Ridoux O. Data mining and cross-checking of execution traces: a re-interpretation of Jones, Harrold and Stasko test information. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 396–399

[21]

Sun S F, Podgurski A. Properties of effective metrics for coveragebased statistical fault localization. In: Proceedings of the IEEE International Conference on Software Testing, Verification and Validation (ICST). 2016, 124–134

[22]

Bai Z, Shu G, Podgurski A. NUMFL: localizing faults in numerical software using a value-based causal model. In: Proceedings of the 8th IEEE International Conference on Software Testing, Verification and Validation (ICST). 2015, 1–10

[23]

Bai Z, Shu G, Podgurski A. Causal inference based fault localization for numerical software with NUMFL. Software Testing, Verification and Reliability, 2017, 27(6): e1613

[24]

Gore R, Reynolds P F. Reducing confounding bias in predicate-level statistical debugging metrics. In: Proceedings of the 34th International Conference on Software Engineering. 2012, 463–473

[25]

Wang X, Cheung S C, Chan W K, Zhang Z. Taming coincidental correctness: coverage refinement with context patterns to improve fault localization. In: Proceedings of the 31st International Conference on Software Engineering. 2009, 45–55

[26]

Masri W, Assi R A. Prevalence of coincidental correctness and mitigation of its impact on fault localization. ACM Transactions on Software Engineering and Methodology (TOSEM), 2014, 23(1): 8

[27]

Miao Y, Chen Z, Li S, Zhao Z, Zhou Y. Identifying coincidental correctness for fault localization by clustering test cases. SEKE. 2012, 267–272

[28]

Zhang X, Gupta N, Gupta R. Locating faulty code by multiple points slicing. Software: Practice and Experience, 2007, 37(9): 935–961

[29]

Zhang X, Gupta N, Gupta R. Pruning dynamic slices with confidence. ACM SIGPLAN Notices, 2006, 41(6): 169–180

[30]

Cover T M, Thomas J A. Elements of Information Theory. Hoboken: John Wiley & Sons, 2012

[31]

Burbea J, Rao C R. On the convexity of some divergence measures based on entropy functions. IEEE Transactions on Information Theory, 1982, 28: 489–495

[32]

Le T D B, Lo D, Li M. Constrained feature selection for localizing faults. In: Proceedings of the International Conference on Software Maintenance and Evolution. 2015, 501–505

[33]

Xu J, Zhang Z, Chan W K, Tse T H, Li S. A general noise-reduction framework for fault localization of Java programs. Information and Software Technology, 2013, 55(5): 880–896

[34]

Zeller A, Hildebrandt R. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 2002, 28(2): 183–200

[35]

Zeller A. Isolating cause-effect chains from computer programs. In: Proceedings of the 10th ACM SIGSOFT Symposium on Foundations of Software Engineering. 2002, 1–10

[36]

Pearl J. Causality: models, reasoning and inference. Econometric Theory, 2003, 19: 675–685

[37]

Ferrante J, Ottenstein K J,Warren J D. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems (TOPLAS), 1987, 9(3): 319–349

[38]

Podgurski A, Clarke L A. A formal model of program dependences and its implications for software testing, debugging, and maintenance. IEEE Transactions on Software Engineering, 1990, 16(9): 965–979

[39]

Shu G, Sun B, Podgurski A, Cao F. MFL: method-level fault localization with causal inference. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 124–133

[40]

Austin P C. Statistical criteria for selecting the optimal number of untreated subjects matched to each treated subject when using many-toone matching on the propensity score. American Journal of Epidemiology, 2010, 172(9): 1092–1097

[41]

Hosmer D W, Lemeshow S. Applied Logistic Regression. Hoboken: John Wiley & Sons, 2013

[42]

Hansen B B, Klopfer S O. Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 2006, 15(3): 609–627

[43]

Friedman J, Hastie T, Tibshirani R. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 2010, 33(1): 1–22

[44]

Ho D, Imai K, King G, Stuart E A. MatchIt: nonparametric preprocessing for parametric causal inference. Journal of Statistical Software, 2011, 42(8): 1–28

[45]

Hutchins M, Foster H, Goradia T, Ostrand T. Experiments on the effectiveness of dataflow-and control-flow-based test adequacy criteria. In: Proceedings of the 16th International Conference on Software Engineering. 1994, 191–200

[46]

Xie X, Kuo F C, Chen T Y, Yoo S, Harman M. Provably optimal and human-competitive results in SBSE for spectrum based fault localisation. In: Proceedings of International Symposium on Search Based Software Engineering, Springer Berlin Heidelberg. 2013, 224–238

[47]

Yu Y, Jones J A, Harrold M J. An empirical study of the effects of test-suite reduction on fault localization. In: Proceedings of the 30th International Conference on Software Engineering. 2008, 201–210

[48]

Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann, 2005

[49]

Ascari L C, Araki L Y, Pozo A R, Vergilio S R. Exploring machine learning techniques for fault localization. In: Proceedings of the 10th Latin American Test Workshop. 2009, 1–6

[50]

Lo D, Jiang L, Budi A. Comprehensive evaluation of association measures for fault localization. In: Proceedings of IEEE International Conference on Software Maintenance. 2010, 1–10

[51]

Jones J A, Harrold M J. Empirical evaluation of the tarantula automatic fault-localization technique. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 273–282

[52]

Roychowdhury S. A mixed approach to spectrum-based fault localization using information theoretic foundations. Doctoral Dissertation, the University of Texas at Austin, 2013

[53]

Jiang B, Zhang Z, Tse T H, Chen T Y. How well do test case prioritization techniques support statistical fault localization. In: Proceedings of the 33rd Annual IEEE International Computer Software and Applications Conference. 2009, 99–106

[54]

Jiang B, Zhang Z, Chan W K, Tse T H. Adaptive random test case prioritization. In: Proceedings of the 24th IEEE/ACM International Conference on Automated Software Engineering. 2009, 233–244

[55]

Yoo S, Harman M, Clark D. Fault localization prioritization: comparing information-theoretic and coverage-based approaches. ACM Transactions on Software Engineering and Methodology (TOSEM), 2013, 22(3): 19

[56]

Roychowdhury S, Khurshid S. A novel framework for locating software faults using latent divergences. In: Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011, 49–64

[57]

Abreu R, Zoeteweij P, Golsteijn R, Van Gemund A J. A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 2009, 82(11): 1780–1792

[58]

Yoo S. Evolving human competitive spectra-based fault localisation techniques. In: Proceedings of International Symposium on Search Based Software Engineering. 2012, 244–258

[59]

Wong W E, Gao R, Li Y, Abreu R, Wotawa F. A survey on software fault localization. IEEE Transactions on Software Engineering, 2016, 42(8): 707–740

[60]

Le T D B, Thung F, Lo D. Theory and practice, do they matchfi a case with spectrum-based fault localization. In: Proceedings of the 29th IEEE International Conference on Software Maintenance (ICSM). 2013, 380–383

[61]

Yoo S, Xie X, Kuo F C, Chen T Y, Harman M. No pot of gold at the end of program spectrum rainbow: greatest risk evaluation formula does not exist. Research Note (RN), 2014, 14(14): 14

[62]

Ju X, Jiang S, Chen X, Wang X, Zhang Y, Cao H. HSFal: effective fault localization using hybrid spectrum of full slices and execution slices. Journal of Systems and Software, 2014, 90: 3–17

[63]

Gupta N, He H, Zhang X, Gupta R. Locating faulty code using failureinducing chops. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering. 2005, 263–272

[64]

Zhang X, Gupta N, Gupta R. Locating faults through automated predicate switching. In: Proceedings of the 28th International Conference on Software Engineering. 2006, 272–281

[65]

Jeffrey D, Gupta N, Gupta R. Fault localization using value replacement. In: Proceedings of the International Symposium on Software Testing and Analysis. 2008, 167–178

[66]

Zhang X, Tallam S, Gupta N, Gupta R. Towards locating execution omission errors. ACM Sigplan Notices, 2007, 42(6): 415–424

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (1378KB)

Supplementary files

Supplementary Material

1056

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/