CTCPPre: A prediction method for accepted pull requests in GitHub

Jing Jiang; Jia-teng Zheng; Yun Yang; Li Zhang

doi:10.1007/s11771-020-4308-z

Journal of Central South University ›› 2020, Vol. 27 ›› Issue (2) :449 -468. DOI: 10.1007/s11771-020-4308-z

Article

CTCPPre: A prediction method for accepted pull requests in GitHub

Author information +

History +

PDF

Abstract

As the popularity of open source projects, the volume of incoming pull requests is too large, which puts heavy burden on integrators who are responsible for accepting or rejecting pull requests. An accepted pull request prediction approach can help integrators by allowing them either to enforce an immediate rejection of code changes or allocate more resources to overcome the deficiency. In this paper, an approach CTCPPre is proposed to predict the accepted pull requests in GitHub. CTCPPre mainly considers code features of modified changes, text features of pull requests’ description, contributor features of developers’ previous behaviors, and project features of development environment. The effectiveness of CTCPPre on 28 projects containing 221096 pull requests is evaluated. Experimental results show that CTCPPre has good performances by achieving accuracy of 0.82, AUC of 0.76 and F1-score of 0.88 on average. It is compared with the state of art accepted pull request prediction approach RFPredict. On average across 28 projects, CTCPPre outperforms RFPredict by 6.64%, 16.06% and 4.79% in terms of accuracy, AUC and F1-score, respectively.

Keywords

accepted pull request / prediction / code review / GitHub / pull-based software development

Cite this article

Download citation ▾

Jing Jiang, Jia-teng Zheng, Yun Yang, Li Zhang. CTCPPre: A prediction method for accepted pull requests in GitHub. Journal of Central South University, 2020, 27(2): 449-468 DOI:10.1007/s11771-020-4308-z

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	GousiosG, ZaidmanA, StoreyM A, DeursenA V. Work practices and challenges in pull-based development: the integrator’s perspective [C]. IEEE, 2015358368

[2]	TsayJ, DabbishL, HerbslebJ. Let’s talk about it: evaluating contributions through discussion in GitHub [C]. ACM, 2014144154

[3]	RahmanM M, RoyC K, KulaR G. Predicting usefulness of code review comments using textual features and developer experience [C]. IEEE, 2017215226

[4]	ZanjaniM B, KagdiH, BirdC. Automatically recommending peer reviewers in modern code review [J]. IEEE Transactions on Software Engineering, 2015, 2015(42): 530-543

[5]	HannebauerC, PatalasM, StünkelS, GruhnV. Automatically recommending code reviewers based on their expertise: An empirical comparison [C]. ACM, 201699110

[6]	XiaX, LoD, WangX-y, YangX-hu. Who should review this change? Putting text and file location analyses together for more accurate recommendations [C]. IEEE, 2015261270

[7]	YuY, WangH-m, YinG, WangTao. Reviewer recommendation for pull-requests in GitHub: What can we learn from code review and bug assignment? [J]. Information and Software Technology, 2016, 74: 204-218

[8]	JiangJ, YangY, HeJ-h, BlancX, ZhangLi. Who should comment on this pull request? Analyzing attributes for more accurate commenter recommendation in pull-based development [J]. Information and Software Technology, 2017, 84: 48-62

[9]	GousiosG, PinzgerM, DeursenA. An exploratory study of the pull-based software development model [C]. ACM, 2014345355

[10]	JiangJ, ZhengJ-t, YangY, ZhangL, LuoJie. Predicting accepted pull requests in GitHub [J]. Science China Information Sciences, 2019

[11]	HellendoornV J, DevanbuP T, BacchelliA. Will they like this?: Evaluating code contributions with language models [C]. IEEE, 2015157167

[12]	BacchelliA, BirdC. Expectations, outcomes, and challenges of modern code review [C]. IEEE, 2013712721

[13]	GMETHVIN. Use by-name parameter for Properties.*OrElse[EB/OL].[2018-05-06]. https://github.com/scala/scala/pull/6885.

[14]	VasilescuB, YuY, WangH-m, DevanbuP, FilkovV. Quality and productivity outcomes relating to continuous integration in GitHub [C]. ACM, 2015805816

[15]	YUE Yue. All projects csv. [EB/OL].[2016-07-04]. https://github.com/Yuyue/pullreq_ci/blob/master/all_projects.csv.

[16]	ChenT-q, GuestrinC. Xgboost: A scalable tree boosting system [C]. ACM, 2016785794

[17]	WeissgerberP, NeuD, DiehlS. Small patches get in! [C]. ACM, 20086776

[18]	PhamR, SingerL, LiskinO, FilhoF F, SchneiderK. Creating a shared understanding of testing culture on a social coding site [C]. IEEE, 2013112121

[19]	DhillonI S, ModhaD S. Concept decompositions for large sparse text data using clustering [J]. Machine Learning, 2001, 42(12): 143-175

[20]	Assigning issues and pull requests to other GitHub users[EB/OL].[2018-05-10]. https://help.github.com/en/articl es/assigning-issues-and-pull-requests-to-other-github-users.

[21]	TsayJ, DabbishL, HerbslebJ. Influence of social and technical factors for evaluating contribution in GitHub [C]. IEEE, 2014356366

[22]	Assignees [EB/OL].[2018-05-20]. https://developer.github.com/v3/issues/assignees/.

[23]	XiaX, LoD, WangX-y, YangX-h, LiS-ping. A comparative study of supervised learning algorithms for re-opened bug prediction [C]. IEEE, 2013331334

[24]	JiangJ, HeJ-h, ChenX-yuan. Coredevrec: Automatic core member recommendation for contribution evaluation [J]. Journal of Computer Science and Technology, 2015, 2015(30): 998-1016

[25]	XiaX, LoD, WangX-y, ZhouBo. Accurate developer recommendation for bug resolution [C]. IEEE, 20137281

[26]	MohamedA, ZhangL, JiangJ, KtobA. Predicting which pull requests will get reopened in GitHub [C]. IEEE, 2018375385

[27]	LamkanfiA, DemeyerS, GigerE, GoethalsB. Predicting the severity of a reported bug [C]. IEEE, 2010110

[28]	LessmannS, BaesensB, MuesC, PietschS. Benchmarking classification models for software defect prediction: A proposed framework and novel findings [J]. IEEE Transactions on Software Engineering, 2008, 2008(34): 485-496

[29]	RomanoD, PinzgerM. Using source code metrics to predict change-prone java interfaces [C]. IEEE, 2011303312

[30]	GousiosG, StoreyM A, BacchelliA. Work practices and challenges in pull-based development: the contributor’s perspective [C]. IEEE, 2016285296

[31]	HiltonM, TunnellT, HuangK, MarinovD, DigD. Usage, costs, and benefits of continuous integration in open-source projects [C]. ACM, 2016426437

[32]	ThongtanunamP, TantithamthavornC, KulaR G, YoshidaN, IidaH, MatsumotoK. Who should review my code? A file location-based code-reviewer recommendation approach for modern code review [C]. IEEE, 2015141150

[33]	BalachandranV. Reducing human effort and improving quality in peer code reviews using automatic static analysis and reviewer recommendation [C]. IEEE, 2013931940

[34]	OuniA, KulaR G, InoueK. Search-based peer reviewers recommendation in modern code review [C]. IEEE, 2016367377

[35]	RahmanM M, RoyC K, CollinsJ A. Correct: code reviewer recommendation in github based on cross-project and technology experience [C]. IEEE, 2016222231