Automatic test report augmentation to assist crowdsourced testing

Xin CHEN , He JIANG , Zhenyu CHEN , Tieke HE , Liming NIE

Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (5) : 943 -959.

PDF (530KB)
Front. Comput. Sci. ›› 2019, Vol. 13 ›› Issue (5) : 943 -959. DOI: 10.1007/s11704-018-7308-5
RESEARCH ARTICLE

Automatic test report augmentation to assist crowdsourced testing

Author information +
History +
PDF (530KB)

Abstract

In crowdsourced mobile application testing, workers are often inexperienced in and unfamiliar with software testing. Meanwhile, workers edit test reports in descriptive natural language on mobile devices. Thus, these test reports generally lack important details and challenge developers in understanding the bugs. To improve the quality of inspected test reports, we issue a new problem of test report augmentation by leveraging the additional useful information contained in duplicate test reports. In this paper, we propose a new framework named test report augmentation framework (TRAF) towards resolving the problem. First, natural language processing (NLP) techniques are adopted to preprocess the crowdsourced test reports. Then, three strategies are proposed to augment the environments, inputs, and descriptions of the inspected test reports, respectively. Finally, we visualize the augmented test reports to help developers distinguish the added information. To evaluate TRAF, we conduct experiments over five industrial datasets with 757 crowdsourced test reports. Experimental results show that TRAF can recommend relevant inputs to augment the inspected test reports with 98.49% in terms of NDCG and 88.65% in terms of precision on average, and identify valuable sentences from the descriptions of duplicates to augment the inspected test reports with 83.58% in terms of precision, 77.76% in terms of recall, and 78.72% in terms of F-measure on average. Meanwhile, empirical evaluation also demonstrates that augmented test reports can help developers understand and fix bugs better.

Keywords

crowdsourced testing / test report / TF-IDF / natural language processing / test report augmentation

Cite this article

Download citation ▾
Xin CHEN, He JIANG, Zhenyu CHEN, Tieke HE, Liming NIE. Automatic test report augmentation to assist crowdsourced testing. Front. Comput. Sci., 2019, 13(5): 943-959 DOI:10.1007/s11704-018-7308-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Wang J, Wang S, Cui Q, Wang Q. Local-based active classification of test report to assist crowdsourced testing. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 190–201

[2]

Nebeling M, Speicher M, Grossniklaus M, Norrie M C. Crowdsourced Web site evaluation with crowdstudy. In: Proceedings of International Conference on Web Engineering. 2012, 494–497

[3]

Chen Z, Luo B. Quasi-crowdsourcing testing for educational projects. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 272–275

[4]

Mao K, Capra L, Harman M, Jia Y. A survey of the use of crowdsourcing in software engineering. Journal of Systems and Software, 2017, 126: 57–84

[5]

Feng Y, Chen Z, Jones J A, Fang C R, Xu B W. Test report prioritization to assist crowdsourced testing. In: Proceedings of ACM SIGSOFT Symposium on the Foundation of Software Engineering/European Software Engineering Conference. 2015, 225–236

[6]

Wang J, Cui Q, Wang Q, Wang S. Towards effectively test report classification to assist crowdsourced testing. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 2016, 6

[7]

Feng Y, Jones J A, Chen Z, Fang C R. Multi-objective test report prioritization using image understanding. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. 2016, 202–213

[8]

Bettenburg N, Premraj R, Zimmermann T, Kin S. Duplicate bug reports considered harmful really? In: Proceedings of the IEEE International Conference on Software Maintenance. 2008, 337–345

[9]

Bettenburg N, Just S, Schröter A, Weiss C, Premraj R, Zimmermann T. What makes a good bug report? In: Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2008, 308–318

[10]

Zimmermann T, Premraj R, Bettenburg N, Just S, Schröter A, Weiss C. What makes a good bug report? IEEE Transactions on Software Engineering, 2010, 36(5): 618–643

[11]

Runeson P, Alexandersson M, Nyholm O. Detection of duplicate defect reports using natural language processing. In: Proceedings of the 29th IEEE International Conference on Software Engineering. 2007, 499–510

[12]

Kaushik N, Tahvildari L. A comparative study of the performance of IR models on duplicate bug detection. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 159–168

[13]

Tian Y, Sun C, Lo D. Improved duplicate bug report identification. In: Proceedings of the 16th European Conference on Software Maintenance and Reengineering. 2012, 385–390

[14]

Aggarwal K, Timbers F, Rutgers T, Hindle A, Greiner R, Stroulia E. Detecting duplicate bug reports with software engineering domain knowledge. Journal of Software: Evolution and Process, 2017, 29(3): e1821

[15]

Thung F, Kochhar P S, Lo D. DupFinder: integrated tool support for duplicate bug report detection. In: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering. 2014, 871–874

[16]

Liu K, Tan H B K, Zhang H. Has this bug been reported? In: Proceedings of the 20th IEEE Working Conference on Reverse Engineering. 2013, 82–91

[17]

Zhang T, Chen J, Jiang H, Luo X P, Xia X. Bug report enrichment with application of automated fixer recommendation. In: Proceedings of the 25th IEEE International Conference on Program Comprehension. 2017, 230–240

[18]

Jiang H, Chen X, He T K, Chen Z Y, Li X C. Fuzzy clustering of crowdsourced test reports for apps. ACM Transactions on Internet Technology, 2018, 18(2): 18

[19]

Chen X, Jiang H, Li X C, He T K, Chen Z Y. Automated quality assessment for crowdsourced test reports of mobile applications. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering. 2018, 368–379

[20]

Shutova E, Sun L, Korhonen A. Metaphor identification using verb and noun clustering. In: Proceedings of the 23rd International Conference on Computational Linguistics. 2010, 1002–1010

[21]

Wang X, Zhang L, Xie T, Anvik J, Sun J. An approach to detecting duplicate bug reports using natural language and execution information. In: Proceedings of the 30th ACM International Conference on Software Engineering. 2008, 461–470

[22]

Cao J, Wu Z, Wu J. Scaling up cosine interesting pattern discovery: a depth-first method. Information Sciences, 2014, 266: 31–46

[23]

Salton G, McGill M. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983

[24]

Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Information Processing & Management, 1988, 24(5): 513–523

[25]

Inouye D, Kalita J K. Comparing twitter summarization algorithms for multiple post summaries. In: Proceedings of the 3rd IEEE Inernational Conference on Social Computing and IEEE Inernational Conference on Privacy, Security, Risk and Trust. 2011, 298–306

[26]

Rastkar S, Murphy G C, Murray G. Automatic summarization of bug reports. IEEE Transaction Software Engineering, 2014, 40(4): 366–380

[27]

Hiew L. Assisted detection of duplicate bug reports. Doctor Dissertation, University of British Columbia, 2006

[28]

Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422–446

[29]

Deshpande M, Karypis G.Item-based top-n recommendation algorithms. ACM Transactions on Information Systems, 2004, 22(1): 143–177

[30]

Wang Y, Wang L, Li Y, He D, Liu T Y. A theoretical analysis of NDCG type ranking measures. In: Proceedings of Annual Conference on Learning Theory. 2013, 25–54

[31]

Nayeem M T, Chali Y. Paraphrastic fusion for abstractive multisentence compression generation. In: Proceedings of the 2007 ACM Conference on Information and Knowledge Management. 2017, 2223–2226

[32]

Salman I, Misirli A T, Juristo N. Are students representatives of professionals in software engineering experiments? In: Proceedings of the 37th International Conference on Software Engineering. 2015, 666–676

[33]

Zhou X, Wan X, Xiao J. Cminer: opinion extraction and summarization for Chinese microblogs. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(7): 1650–1663

[34]

Howe J. The rise of crowdsourcing. Wired Magazine, 2006, 14(6): 1–4

[35]

Liu D, Bias R G, Lease M, Bias R G. Crowdsourcing for usability testing. Proceedings of the American Society for Information Science and Technology, 2012, 49(1): 1–10

[36]

Dolstra E, Vliegendhart R, Pouwelse J. Crowdsourcing gui tests. In: Proceedings of the 6th IEEE International Conference on Software Testing, Verification and Validation. 2013, 332–341

[37]

Pastore F, Mariani L, Fraser G. Crowdoracles: can the crowd solve the oracle problem. In: Proceedings of International Conference on Software Testing, Verification and Validation. 2013, 342–351

[38]

Yan M, Sun H, Liu X. iTest: testing software with mobile crowdsourcing. In: Proceedings of the 1st International Workshop on Crowd-based Software Development Methods and Technologies. 2014, 19–24

[39]

Wu G, Cao Y, Chen W, Wei J, Zhong H, Huang T. AppCheck: a crowdsourced testing service for android applications. In: Proceedings of IEEE International Conference on Web Services. 2017: 253–260

[40]

Cai Y, Zhang J, Cao L, Liu J. A deployable sampling strategy for data race detection. In: Proceedings of the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 810–821

[41]

Wei L, Liu Y, Cheung S C. OASIS: prioritizing static analysis warnings for Android apps based on app user reviews. In: Proceedings of the 11st ACM Joint Meeting on Foundations of Software Engineering. 2017, 672–682

[42]

Wang J, Cui Q, Wang S, Wang Q. Domain adaptation for test report classification in crowdsourced testing. In: Proceedings of the 39th IEEE International Conference on Software Engineering: Software Engineering in Practice Track. 2017, 83–92

[43]

Guo S, Chen R, Li H. Using knowledge transfer and rough set to predict the severity of android test reports via text mining. Symmetry, 2017, 9(8): 161

[44]

Nazar N, Jiang H, Gao G, Zhang T, Li X C, Ren Z L. Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10(3): 504–517

[45]

Sun C, Lo D, Khoo S C, Jiang J. Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering. 2011, 253–262

[46]

Sun C, Lo D, Wang X, Jing J, Khoo S C. A discriminative model approach for accurate duplicate bug report retrieval. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering. 2010, 45–54

[47]

Nguyen A T, Nguyen T T, Nguyen T N, Lo D, Sun C. Duplicate bug report detection with a combination of information retrieval and topic modeling. In: Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering. 2012, 70–79

RIGHTS & PERMISSIONS

Higher Education Press and Springer-Verlag GmbH Germany, part of Springer Nature

AI Summary AI Mindmap
PDF (530KB)

Supplementary files

Supplementary Material

1100

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/