ForkXplorer: an approach of fork summary generation

Zhang ZHANG , Xinjun MAO , Chao ZHANG , Yao LU

Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162202

PDF (11500KB)
Front. Comput. Sci. ›› 2022, Vol. 16 ›› Issue (2) : 162202 DOI: 10.1007/S11704-020-0047-4
Software
RESEARCH ARTICLE

ForkXplorer: an approach of fork summary generation

Author information +
History +
PDF (11500KB)

Abstract

Pull-based development has become an important paradigm for distributed software development. In this model, each developer independently works on a copied repository (i.e., a fork) from the central repository. It is essential for developers to maintain awareness of the state of other forks to improve collaboration efficiency. In this paper, we propose a method to automatically generate a summary of a fork. We first use the random forest method to generate the label of a fork, i.e., feature implementation or a bug fix. Based on the information of the fork-related commits, we then use the TextRank algorithm to generate detailed activity information of the fork. Finally, we apply a set of rules to integrate all related information to construct a complete fork summary. To validate the effectiveness of our method, we conduct 30 groups of manual experiment and 77 groups of case studies on Github. We propose F e a a v g to evaluate the performance of the generated fork summary, considering the content accuracy, content integrity, sentence fluency, and label extraction accuracy. The results show that the average of F e a a v g of the fork summary generated by this method is 0.672. More than 63% of project maintainers and the contributors believe that the fork summary can improve development efficiency.

Graphical abstract

Keywords

open source software / pull-based development / fork summary / distributed cooperative development

Cite this article

Download citation ▾
Zhang ZHANG, Xinjun MAO, Chao ZHANG, Yao LU. ForkXplorer: an approach of fork summary generation. Front. Comput. Sci., 2022, 16(2): 162202 DOI:10.1007/S11704-020-0047-4

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Gousios G, Storey M A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of IEEE/ACM International Conference on Software Engineering. 2016, 285-296

[2]

Lu Y , Mao X , Wang T , Yin G , Li Z . Improving students’ programming quality with the continuous inspection process: a social coding perspective. Frontiers of Computer Science, 2020, 14( 5): 1– 18

[3]

Jiang J , Lo D , He J , Xia X , Kochhar P S , Zhang L . Why and how developers fork what from whom in GitHub. Empirical Software Engineering, 2017, 22( 1): 547– 578

[4]

Bitzer J, Schröder P J H. The Economics of open source software development. 1st ed. Kidlington: Elsevier, 2006

[5]

Abdullah R, Lakulu M, Ibrahim H, Selamat M H, Nor M Z M. The challenges of open source software development with collaborative environment. In: Proceedings of IEEE International Conference on Computer Technology and Development. 2009, 251-255

[6]

Padhye R, Mani S, Sinha V S. A study of external community contribution to open-source projects on GitHub. In: Proceedings of the Working Conference on Mining Software Repositories. 2014, 332-335

[7]

Ren L, Zhou S, Kästner C, Wąsowski A. Identifying redundancies in fork-based development. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2019, 230-241

[8]

Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160

[9]

Ren L, Zhou S, Kästner C. Poster: Forks insight: providing an overview of GitHub forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 179-180

[10]

Zhou S, Stanciulescu S, Leßenich O, Xiong Y, Wasowski A, Kästner C. Identifying features in forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 105–116

[11]

Yu Y, Li Z, Yin G, Wang T, Wang H M. A dataset of duplicate pullrequests in Github. In: Proceedings of International Conference on Mining Software Repositories. 2018, 22-25

[12]

Zhu J, Zhou M, Mockus A. Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 871-882

[13]

Li L, Ren Z, Li X, Zou W, Jiang H. How are issue units linked? Empirical study on the linking behavior in GitHub. In: Proceedings of IEEE Asia-Pacific Software Engineering Conference. 2018, 386-395

[14]

Li Z, Yin G, Yu Y, Wang T, Wang H. Detecting duplicate pull-requests in github. In: Proceedings of Asia-Pacific Symposium on Internetware. 2017, 1-6

[15]

Ruan H , Chen B , Peng X , Zhao W . DeepLink: Recovering issuecommit links based on deep learning. Journal of Systems and Software, 2019, 158 : 110406–

[16]

Sun Y, Chen C, Wang Q, Boehm, B. Improving missing issue-commit link recovery using positive and unlabeled data. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 147-152

[17]

Salton G , Wong A , Yang C S . A vector space model for automatic indexing. Communications of the ACM, 1975, 18( 11): 613– 620

[18]

Salton G , Buckley C . Term-weighting approaches in automatic text retrieval. Information processing & management, 1988, 24( 5): 513– 523

[19]

James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. 1st ed. New York: Springer, 2013

[20]

Liu Z , Chen X , Sun M . Mining the interests of Chinese microbloggers via keyword extraction. Frontiers of Computer Science, 2012, 6( 1): 76– 87

[21]

Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. 2004, 404-411

[22]

Gambhir M , Gupta V . Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 2017, 47( 1): 1– 66

[23]

Nyman L , Mikkonen T . To fork or not to fork: Fork motivations in SourceForge projects. International Journal of Open Source Software and Processes, 2011, 3( 3): 1– 9

[24]

Robles G, González-Barahona J M. A comprehensive study of software forks: dates, reasons and outcomes. In: Proceedings of IFIP International Conference on Open Source Systems. 2012, 1-14

[25]

Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160

[26]

Gousios G, Pinzger M, Deursen A. An exploratory study of the pullbased software development model. In: Proceedings of International Conference on Software Engineering. 2014, 345-355

[27]

Dabbish L, Stuart C, Tsay J, Herbsleb J. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 2012, 1277-1286

[28]

Dabbish L , Stuart C , Tsay J , Herbsleb J . Leveraging transparency. IEEE Software, 2012, 30( 1): 37– 43

[29]

Kuhn A , Ducasse S , Gírba T . Semantic clustering: Identifying topics in source code. Information and Software Technology, 2007, 49( 3): 230– 243

[30]

Murphy G C. Lightweight structural summarization as an aid to software evolution. Seattle: University of Washington, 1996

[31]

Poshyvanyk D, Marcus A. Combining formal concept analysis with information retrieval for concept location in source code. In: Proceedings of IEEE International Conference on Program Comprehension. 2007, 37-48

[32]

Storey M A, Cheng L T, Bull I, Rigby P. Shared waypoints and social tagging to support collaboration in software development. In: Proceedings of ACM Anniversary Conference on Computer Supported Cooperative Work. 2006, 195–198

[33]

Khatavkar V, Kulkarni P. Comparison of support vector machines with and without latent semantic analysis for document classification. In: Proceedings of International Conference on Data Management, Analytics & Innovation. 2019, 263-274

[34]

Nazar N , Jiang H , Gao G , Zhang T , Li X , Ren Z . Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10( 3): 504– 517

[35]

Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk, D. On automatically generating commit messages via summarization of source code changes. In: Proceedings of IEEE International Working Conference on Source Code Analysis and Manipulation. 2014, 275-284

[36]

Jiang S, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 135-146

[37]

Liu Z, Xia X, Hassan A E, Lo D, Xing Z, Wang X. Neural-machinetranslation-based commit message generation: how far are we? In: Proceedings of ACM/IEEE International Conference on Automated Software Engineering. 2018, 373-384

[38]

Zaidi A. Summarizing git commits and Github pull requests using sequence to sequence neural attention models. California: Stanford University, 2017

[39]

Liu Z, Xia X, Treude C, Lo D, Li S. Automatic generation of pull request descriptions. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2019, 176-188

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (11500KB)

Supplementary files

Highlights

2518

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/