Please wait a minute...

Frontiers of Computer Science

Front. Comput. Sci.    2018, Vol. 12 Issue (5) : 923-938     https://doi.org/10.1007/s11704-017-6573-z
RESEARCH ARTICLE |
Correlation-based software search by leveraging software term database
Zhixing LI, Gang YIN(), Tao WANG, Yang ZHANG, Yue YU, Huaimin WANG
National Laboratory for Parallel and Distributed Processing, College of Computer, National University of Defense Technology, Changsha 410073, China
Download: PDF(638 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Internet-scale open source software (OSS) production in various communities generates abundant reusable resources for software developers. However, finding the desired and mature software with keyword queries from a considerable number of candidates, especially for the fresher, is a significant challenge because current search services often fail to understand the semantics of user queries. In this paper, we construct a software term database (STDB) by analyzing tagging data in Stack Overflow and propose a correlationbased software search (CBSS) approach that performs correlation retrieval based on the term relevance obtained from STDB. In addition, we design a novel ranking method to optimize the initial retrieval result. We explore four research questions in four experiments, respectively, to evaluate the effectiveness of the STDB and investigate the performance of the CBSS. The experiment results show that the proposed CBSS can effectively respond to keyword-based software searches and significantly outperforms other existing search services at finding mature software.

Keywords software retrieval      software term database      open source software     
Corresponding Authors: Gang YIN   
Just Accepted Date: 07 April 2017   Online First Date: 25 May 2018    Issue Date: 21 September 2018
 Cite this article:   
Zhixing LI,Gang YIN,Tao WANG, et al. Correlation-based software search by leveraging software term database[J]. Front. Comput. Sci., 2018, 12(5): 923-938.
 URL:  
http://journal.hep.com.cn/fcs/EN/10.1007/s11704-017-6573-z
http://journal.hep.com.cn/fcs/EN/Y2018/V12/I5/923
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Zhixing LI
Gang YIN
Tao WANG
Yang ZHANG
Yue YU
Huaimin WANG
1 Frakes W B, Kang K. Software reuse research: status and future. IEEE transactions on Software Engineering, 2005, 31(7): 529–536
https://doi.org/10.1109/TSE.2005.85
2 Yin G, Wang T, Wang H, Fan Q, Zhang Y, Yu Y, Yang C. OSSEAN: mining crowd wisdom in open source communities. In: Proceedings of IEEE Symposium on Service-oriented System Engineering. 2015, 367–371
https://doi.org/10.1109/SOSE.2015.51
3 Krueger C W. Software reuse. ACM Computing Surveys, 1992, 24(2): 131–183
https://doi.org/10.1145/130844.130856
4 Ghezzi C, Jazayeri M, Mandrioli D. Fundamentals of Software Engineering. Beijing: China Electric Power Press,2006
5 Haiduc S, Bavota G, Marcus A, Oliveto R, De Lucia A, Menzies T. Automatic query reformulations for text retrieval in software engineering. In: Proceedings of the International Conference on Software Engineering. 2013, 842–851
https://doi.org/10.1109/ICSE.2013.6606630
6 Chau M, Chen H. Comparison of three vertical search spiders. Computer, 2003, 36(5): 56–62
https://doi.org/10.1109/MC.2003.1198237
7 Guha R, McCool R, Miller E. Semantic search. Bulletin of the American Society for Information Science & Technology, 2003, 36(1): 700–709
https://doi.org/10.1145/775152.775250
8 Howard M J, Gupta S, Pollock L, Vijay-Shanker K. Automatically mining software-based, semantically-similar words from comment-code mappings. In: Proceedings of the 10th Working Conference on Mining Software Repositories. 2013, 377–386
https://doi.org/10.1109/MSR.2013.6624052
9 Yang J, Tan L. Swordnet: inferring semantically related words from software context. Empirical Software Engineering, 2014, 19(6): 161–170
https://doi.org/10.1007/s10664-013-9264-x
10 Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607
https://doi.org/10.1109/ICSM.2012.6405332
11 Tian Y, Lo D, Lawall J. Automated construction of a software-specific word similarity database. In: Proceedings of IEEE Conference on Software Maintenance, Reengineering and Reverse Engineering. 2014, 44–53
https://doi.org/10.1109/CSMR-WCRE.2014.6747213
12 Meij E, Balog K, Odijk D. Entity linking and retrieval for semantic search. In: Proceedings of ACM International Conference on Web Search and Data Mining. 2014, 683–684
https://doi.org/10.1145/2556195.2556201
13 Rasolofo Y, Savoy J. Term proximity scoring for keyword-based retrieval systems. In: Proceedings of European Conference on Information Retrieval. 2003, 207–218
https://doi.org/10.1007/3-540-36618-0_15
14 Widdows C, Duijnhouwer F. Open source maturity model. Cap Gemini Ernst & Young, 2003
15 Wasserman A I, Pal M, Chan C. The business readiness rating: a framework for evaluating open source. EFOSS-Evaluation Framework for Open Source Software, 2006
16 Russo B, Damiani E, Hissam S, Lundell B, Succi G. Open Source Development, Communities and Quality. Springer US, 2008
17 Yu Y, Wang H, Yin G, Wang T. Reviewer recommendation for pullrequests in GitHub: what can we learn from code review and bug assignment. Information and Software Technology, 2016, 74: 204–218
https://doi.org/10.1016/j.infsof.2016.01.004
18 Fan Q, Wang H, Yin G, Wang T. Ranking open source software based on crowd wisdom. In: Proceedings of IEEE International Conference on Software Engineering and Service Science. 2015, 966–972
https://doi.org/10.1109/ICSESS.2015.7339215
19 Zhang Y, Yin G, Wang T, Yu Y, Wang H. Evaluating bug severity using crowd-based knowledge: an exploratory study. In: Proceedings of the 7th Asia-Pacific Symposium on Internetware. 2015, 70–73
https://doi.org/10.1145/2875913.2875918
20 Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335
https://doi.org/10.1109/ASONAM.2014.6921605
21 Pal D, Mitra M, Bhattacharya S. Exploring query categorisation for query expansion: a study. 2015, arXiv preprint arXiv:1509.05567
22 Miller G A. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38(11): 39–41
https://doi.org/10.1145/219717.219748
23 Stanley C, Byrne M D. Predicting tags for stackoverflow posts. Proceedings of ICCM, 2013
24 Short L, Wong C, Zeng D. Tag recommendations in stackoverflow. San Francisco: Stanford University, 2014
25 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. 2013, arXiv preprint arXiv:1301.3781
26 Jamieson S. Likert scales: how to (ab)use them. Medical Education, 2004, 38(38): 1217–1218
https://doi.org/10.1111/j.1365-2929.2004.02012.x
27 Manning C D, Raghavan P, Tze H. Introduction to Information Retrieval. Beijing: Posts & Telecom Press, 2010
28 Aula A, Majaranta P, Räihä K J. Eye-tracking reveals the personal styles for search result evaluation. In: Proceedings of IFIP Conference on Human-Computer Interaction. 2005, 1058–1061
https://doi.org/10.1007/11555261_104
29 Hucka M, Graham M J. Software search is not a science, even among scientists. 2016, arXiv preprint arXiv:1605.02265
30 Bissyande T F, Thung F, Lo D, Jiang L, Reveillere L. Orion: a software project search engine with integrated diverse software artifacts. In: Proceedings of the International Conference on Engineering of Complex Computer Systems. 2013, 242–245
https://doi.org/10.1109/ICECCS.2013.42
31 Linstead E, Bajracharya S, Ngo T, Rigor P, Lopes C, Baldi P P. Sourcerer: mining and searching Internet-scale software repositories. Data Mining and Knowledge Discovery, 2009, 18(2): 300–336
https://doi.org/10.1007/s10618-008-0118-x
32 Lu M, Sun X, Wang S, Lo D. Query expansion via wordnet for effective code search. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2015, 545–549
33 Nie L, Jiang H, Ren Z, Sun Z, Li X. Query expansion based on crowd knowledge for code search. IEEE Transactions on Services Computing, 2016, 9(5): 771–783
https://doi.org/10.1109/TSC.2016.2560165
34 Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. Codehow: effective code search based on API understanding and extended boolean model(e). In: Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering. 2015, 260–270
https://doi.org/10.1109/ASE.2015.42
35 McMillan C, Grechanik M, Poshyvanyk D, Fu C, Xie Q. Exemplar: a source code search engine for finding highly relevant applications. IEEE Transactions on Software Engineering, 2012, 38(5): 1069–1087
https://doi.org/10.1109/TSE.2011.84
36 Sridhara G, Hill E, Pollock L, Vijay-Shanker K. Identifying word relations in software: a comparative study of semantic similarity tools. In: Proceedings of IEEE International Conference on Program Comprehension. 2008, 123–132
https://doi.org/10.1109/ICPC.2008.18
37 Wang S, Lo D, Jiang L. Inferring semantically related software terms and their taxonomy by leveraging collaborative tagging. In: Proceedings of IEEE International Conference on Software Maintenance. 2012, 604–607
https://doi.org/10.1109/ICSM.2012.6405332
38 Tian Y, Lo D, Lawall J. SEWordSim: software-specific word similarity database. In: Proceedings of the 36th ACM International Conference on Software Engineering. 2014, 568–571
https://doi.org/10.1145/2591062.2591071
39 Bhat V, Gokhale A, Jadhav R, Pudipeddi J, Akoglu L. Min(e)d your tags: analysis of question response time in stackoverflow. In: Proceedings of IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. 2014, 328–335
https://doi.org/10.1109/ASONAM.2014.6921605
40 Wang S, Lo D, Vasilescu B, Serebrenik A. Entagrec: an enhanced tag recommendation system for software information sites. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2014, 291–300
https://doi.org/10.1109/ICSME.2014.51
41 Mo W, Zhu J, Qian Z, Shen B. SOLinker: constructing semantic links between tags and URLs on StackOverflow. In: Proceedings of the 40th IEEE Annual Computer Software and Applications Conference. 2016, 582–591
https://doi.org/10.1109/COMPSAC.2016.194
42 Chen C, Gao S, Xing Z. Mining analogical libraries in Q&A discussions–incorporating relational and categorical knowledge into word embedding. In: Proceedings of the 23rd IEEE International Conference on Software Analysis, Evolution, and Reengineering. 2016, 338–348
https://doi.org/10.1109/SANER.2016.21
Related articles from Frontiers Journals
[1] Tao WANG, Huaimin WANG, Gang YIN, Charles X. LING, Xiao LI, Peng ZOU. Tag recommendation for open source software[J]. Front. Comput. Sci., 2014, 8(1): 69-82.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed