PDF
(11500KB)
Abstract
Pull-based development has become an important paradigm for distributed software development. In this model, each developer independently works on a copied repository (i.e., a fork) from the central repository. It is essential for developers to maintain awareness of the state of other forks to improve collaboration efficiency. In this paper, we propose a method to automatically generate a summary of a fork. We first use the random forest method to generate the label of a fork, i.e., feature implementation or a bug fix. Based on the information of the fork-related commits, we then use the TextRank algorithm to generate detailed activity information of the fork. Finally, we apply a set of rules to integrate all related information to construct a complete fork summary. To validate the effectiveness of our method, we conduct 30 groups of manual experiment and 77 groups of case studies on Github. We propose to evaluate the performance of the generated fork summary, considering the content accuracy, content integrity, sentence fluency, and label extraction accuracy. The results show that the average of of the fork summary generated by this method is 0.672. More than 63% of project maintainers and the contributors believe that the fork summary can improve development efficiency.
Graphical abstract
Keywords
open source software
/
pull-based development
/
fork summary
/
distributed cooperative development
Cite this article
Download citation ▾
Zhang ZHANG, Xinjun MAO, Chao ZHANG, Yao LU.
ForkXplorer: an approach of fork summary generation.
Front. Comput. Sci., 2022, 16(2): 162202 DOI:10.1007/S11704-020-0047-4
| [1] |
Gousios G, Storey M A, Bacchelli A. Work practices and challenges in pull-based development: the contributor’s perspective. In: Proceedings of IEEE/ACM International Conference on Software Engineering. 2016, 285-296
|
| [2] |
Lu Y , Mao X , Wang T , Yin G , Li Z . Improving students’ programming quality with the continuous inspection process: a social coding perspective. Frontiers of Computer Science, 2020, 14( 5): 1– 18
|
| [3] |
Jiang J , Lo D , He J , Xia X , Kochhar P S , Zhang L . Why and how developers fork what from whom in GitHub. Empirical Software Engineering, 2017, 22( 1): 547– 578
|
| [4] |
Bitzer J, Schröder P J H. The Economics of open source software development. 1st ed. Kidlington: Elsevier, 2006
|
| [5] |
Abdullah R, Lakulu M, Ibrahim H, Selamat M H, Nor M Z M. The challenges of open source software development with collaborative environment. In: Proceedings of IEEE International Conference on Computer Technology and Development. 2009, 251-255
|
| [6] |
Padhye R, Mani S, Sinha V S. A study of external community contribution to open-source projects on GitHub. In: Proceedings of the Working Conference on Mining Software Repositories. 2014, 332-335
|
| [7] |
Ren L, Zhou S, Kästner C, Wąsowski A. Identifying redundancies in fork-based development. In: Proceedings of IEEE International Conference on Software Analysis, Evolution and Reengineering. 2019, 230-241
|
| [8] |
Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160
|
| [9] |
Ren L, Zhou S, Kästner C. Poster: Forks insight: providing an overview of GitHub forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 179-180
|
| [10] |
Zhou S, Stanciulescu S, Leßenich O, Xiong Y, Wasowski A, Kästner C. Identifying features in forks. In: Proceedings of ACM/IEEE International Conference on Software Engineering. 2018, 105–116
|
| [11] |
Yu Y, Li Z, Yin G, Wang T, Wang H M. A dataset of duplicate pullrequests in Github. In: Proceedings of International Conference on Mining Software Repositories. 2018, 22-25
|
| [12] |
Zhu J, Zhou M, Mockus A. Effectiveness of code contribution: from patch-based to pull-request-based tools. In: Proceedings of ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2016, 871-882
|
| [13] |
Li L, Ren Z, Li X, Zou W, Jiang H. How are issue units linked? Empirical study on the linking behavior in GitHub. In: Proceedings of IEEE Asia-Pacific Software Engineering Conference. 2018, 386-395
|
| [14] |
Li Z, Yin G, Yu Y, Wang T, Wang H. Detecting duplicate pull-requests in github. In: Proceedings of Asia-Pacific Symposium on Internetware. 2017, 1-6
|
| [15] |
Ruan H , Chen B , Peng X , Zhao W . DeepLink: Recovering issuecommit links based on deep learning. Journal of Systems and Software, 2019, 158 : 110406–
|
| [16] |
Sun Y, Chen C, Wang Q, Boehm, B. Improving missing issue-commit link recovery using positive and unlabeled data. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 147-152
|
| [17] |
Salton G , Wong A , Yang C S . A vector space model for automatic indexing. Communications of the ACM, 1975, 18( 11): 613– 620
|
| [18] |
Salton G , Buckley C . Term-weighting approaches in automatic text retrieval. Information processing & management, 1988, 24( 5): 513– 523
|
| [19] |
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. 1st ed. New York: Springer, 2013
|
| [20] |
Liu Z , Chen X , Sun M . Mining the interests of Chinese microbloggers via keyword extraction. Frontiers of Computer Science, 2012, 6( 1): 76– 87
|
| [21] |
Mihalcea R, Tarau P. Textrank: Bringing order into text. In: Proceedings of Conference on Empirical Methods in Natural Language Processing. 2004, 404-411
|
| [22] |
Gambhir M , Gupta V . Recent automatic text summarization techniques: a survey. Artificial Intelligence Review, 2017, 47( 1): 1– 66
|
| [23] |
Nyman L , Mikkonen T . To fork or not to fork: Fork motivations in SourceForge projects. International Journal of Open Source Software and Processes, 2011, 3( 3): 1– 9
|
| [24] |
Robles G, González-Barahona J M. A comprehensive study of software forks: dates, reasons and outcomes. In: Proceedings of IFIP International Conference on Open Source Systems. 2012, 1-14
|
| [25] |
Stănciulescu Ş, Schulze S, Wąsowski A. Forked and integrated variants in an open-source firmware project. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution. 2015, 151-160
|
| [26] |
Gousios G, Pinzger M, Deursen A. An exploratory study of the pullbased software development model. In: Proceedings of International Conference on Software Engineering. 2014, 345-355
|
| [27] |
Dabbish L, Stuart C, Tsay J, Herbsleb J. Social coding in GitHub: transparency and collaboration in an open software repository. In: Proceedings of ACM Conference on Computer Supported Cooperative Work. 2012, 1277-1286
|
| [28] |
Dabbish L , Stuart C , Tsay J , Herbsleb J . Leveraging transparency. IEEE Software, 2012, 30( 1): 37– 43
|
| [29] |
Kuhn A , Ducasse S , Gírba T . Semantic clustering: Identifying topics in source code. Information and Software Technology, 2007, 49( 3): 230– 243
|
| [30] |
Murphy G C. Lightweight structural summarization as an aid to software evolution. Seattle: University of Washington, 1996
|
| [31] |
Poshyvanyk D, Marcus A. Combining formal concept analysis with information retrieval for concept location in source code. In: Proceedings of IEEE International Conference on Program Comprehension. 2007, 37-48
|
| [32] |
Storey M A, Cheng L T, Bull I, Rigby P. Shared waypoints and social tagging to support collaboration in software development. In: Proceedings of ACM Anniversary Conference on Computer Supported Cooperative Work. 2006, 195–198
|
| [33] |
Khatavkar V, Kulkarni P. Comparison of support vector machines with and without latent semantic analysis for document classification. In: Proceedings of International Conference on Data Management, Analytics & Innovation. 2019, 263-274
|
| [34] |
Nazar N , Jiang H , Gao G , Zhang T , Li X , Ren Z . Source code fragment summarization with small-scale crowdsourcing based features. Frontiers of Computer Science, 2016, 10( 3): 504– 517
|
| [35] |
Cortés-Coy L F, Linares-Vásquez M, Aponte J, Poshyvanyk, D. On automatically generating commit messages via summarization of source code changes. In: Proceedings of IEEE International Working Conference on Source Code Analysis and Manipulation. 2014, 275-284
|
| [36] |
Jiang S, Armaly A, McMillan C. Automatically generating commit messages from diffs using neural machine translation. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2017, 135-146
|
| [37] |
Liu Z, Xia X, Hassan A E, Lo D, Xing Z, Wang X. Neural-machinetranslation-based commit message generation: how far are we? In: Proceedings of ACM/IEEE International Conference on Automated Software Engineering. 2018, 373-384
|
| [38] |
Zaidi A. Summarizing git commits and Github pull requests using sequence to sequence neural attention models. California: Stanford University, 2017
|
| [39] |
Liu Z, Xia X, Treude C, Lo D, Li S. Automatic generation of pull request descriptions. In: Proceedings of IEEE/ACM International Conference on Automated Software Engineering. 2019, 176-188
|
RIGHTS & PERMISSIONS
Higher Education Press