Please wait a minute...

Frontiers of Computer Science

Front. Comput. Sci.    2018, Vol. 12 Issue (6) : 1105-1124     https://doi.org/10.1007/s11704-016-6301-0
RESEARCH ARTICLE |
Change profile analysis of open-source software systems to understand their evolutionary behavior
Munish SAINI(), Kuljit Kaur CHAHAL()
Department of Computer Science, Guru Nanak Dev University, Amritsar 143005, India
Download: PDF(816 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Source code management systems (such as git) record changes to code repositories of Open-Source Software (OSS) projects. The metadata about a change includes a change message to record the intention of the change. Classification of changes,based on change messages, into different change types has been explored in the past to understand the evolution of software systems from the perspective of change size and change density only. However, software evolution analysis based on change classification with a focus on change evolution patterns is still an open research problem. This study examines change messages of 106 OSS projects, as recorded in the git repository, to explore their evolutionary patterns with respect to the types of changes performed over time. An automated keyword-based classifier technique is applied to the change messages to categorize the changes into various types (corrective, adaptive, perfective, preventive, and enhancement). Cluster analysis helps to uncover distinct change patterns that each change type follows. We identify three categories of 106 projects for each change type: high activity, moderate activity, and low activity. Evolutionary behavior is different for projects of different categories. The projects with high and moderate activity receive maximum changes during 76–81 months of the project lifetime. The project attributes such as the number of committers, number of files changed, and total number of commits seem to contribute the most to the change activity of the projects. The statistical findings show that the change activity of a project is related to the number of contributors, amount of work done, and total commits of the projects irrespective of the change type. Further, we explored languages and domains of projects to correlate change types with domains and languages of the projects. The statistical analysis indicates that there is no significant and strong relation of change types with domains and languages of the 106 projects.

Keywords software evolution      open-source software (OSS)      cluster analysis      change classification     
Corresponding Authors: Munish SAINI,Kuljit Kaur CHAHAL   
Just Accepted Date: 16 November 2016   Online First Date: 24 January 2018    Issue Date: 04 December 2018
 Cite this article:   
Munish SAINI,Kuljit Kaur CHAHAL. Change profile analysis of open-source software systems to understand their evolutionary behavior[J]. Front. Comput. Sci., 2018, 12(6): 1105-1124.
 URL:  
http://journal.hep.com.cn/fcs/EN/10.1007/s11704-016-6301-0
http://journal.hep.com.cn/fcs/EN/Y2018/V12/I6/1105
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Munish SAINI
Kuljit Kaur CHAHAL
1 Lehman M M. Programs, life cycles and laws of software evolution. Proceedings of the IEEE, 1980, 68(9): 1060–1076
https://doi.org/10.1109/PROC.1980.11805
2 Hindle A, Godfrey M, Holt R C. Mining recurrent activities: fourier analysis of change events. In: Proceedings of the 31st International Conference on Software Engineering-Companion. 2009, 295–298
https://doi.org/10.1109/ICSE-COMPANION.2009.5071005
3 Mockus A, Votta L G. Identifying reasons for software changes using historic databases. In: Proceedings of International Conference on Software Maintenance. 2000, 120–130
https://doi.org/10.1109/ICSM.2000.883028
4 Hassan A. Automated classification of change messages in open source projects. ACM Symposium on Applied Computing. 2008, 837–841
https://doi.org/10.1145/1363686.1363876
5 Kolassa C, Riehle D, Salim M. The empirical commit frequency distribution of open source projects. In: Proceedings of ACM Joint International Symposium on Wikis and Open Collaboration. 2013
https://doi.org/10.1145/2491055.2491073
6 Lin S H, Ma Y T, Chen J X. Empirical evidence on developer’s commit activity for open-source software projects. In: Proceedings of the 25th International Conference on Software Engineering and Knowledge Engineering. 2013, 455–460
7 Tiwari P, Li W, Alomainy R, Wei B Y. An empirical study of different types of changes in the eclipse project. The Open Software Engineering Journal, 2013, 7: 24–37
https://doi.org/10.2174/1874107X01307010024
8 Kemerer C F, Slaughter S A. An empirical approach to studying software evolution. IEEE Transactions on Software Engineering, 1999, 25(4): 493–509
https://doi.org/10.1109/32.799945
9 Bennett K H. Software maintenance and evolution: a roadmap. In: Proceedings of the 22nd International Conference on Software Engineering. 2000, 73–78
https://doi.org/10.1145/336512.336534
10 Gupta A, Conradi R, Shull F, Cruzes D, Ackermann C, Rønneberg H, Landre E. Experience report on the effect of software development characteristics on change distribution. In: Proceedings of the 9th International Conference on Product Focused Software Process Improvement. 2008, 158–173
https://doi.org/10.1007/978-3-540-69566-0_15
11 Smith N, Capiluppi A, Ramil J F. A study of open source software evolution data using qualitative simulation. Software Process: Improvement and Practice, 2005, 10(3): 287–300
https://doi.org/10.1002/spip.230
12 Gonzalez-Barahona J, Robles G, Herriaz I, Ortega F. Studying the laws of software evolution in a long-lived FLOSS project. Journal of Software: Evolution and Process, 2014, 26(7): 589–612
https://doi.org/10.1002/smr.1615
13 Koch S. Evolution of open source software systems–a large-scale investigation. In: Proceedings of the 1st International Conference on Open Source Systems. 2005, 148–153
14 Schach S R, Jin B, Wright D R, Heller G Z, Offutt J. Determining the distribution of maintenance categories: survey versus measurement. Empirical Software Engineering, 2003, 8(4): 351–365
https://doi.org/10.1023/A:1025368318006
15 Burch E, Kungs H J. Modeling software maintenance requests: acase study. In: Proceedings of the International Conference on Software Maintenance. 1997, 40–47
https://doi.org/10.1109/ICSM.1997.624229
16 Swanson B. The dimensions of maintenance. In: Proceedings of the 2nd International Conference on Software Engineering. 1976, 492–497
17 IEEE. Standard for Software Maintenance (IEEE Std 1219–1998). New York: Institute for Electrical and Electronic Engineers, 1998
18 ISO/IEC FDIS 14764:1999(E). Software Engineering—Software Maintenance. Geneva: International Standards Organization, 1999
19 Lientz B P, Swanson E B, Tompkins G E. Characteristics of application software maintenance. Communication of the ACM, 1978, 21(6): 466–471
https://doi.org/10.1145/359511.359522
20 Nosek J, Palvia T P. Software maintenance management: changes in the last decade. Journal of Software Maintenance: Research and Practice, 1990, 2(3): 157–174
https://doi.org/10.1002/smr.4360020303
21 Lee M G, Jefferson T L. An empirical study of software maintenance of a Web-based Java application. In: Proceedings of the 21st IEEE International Conference on Software Maintenance. 2005, 571–576
https://doi.org/10.1109/ICSM.2005.19
22 Basili V, Briand L C, Condon S, Kim Y M, Melo W L, Valettt J D. Understanding and predicting the process of software maintenance releases. In: Proceedings of the 18th International Conference on Software Engineering. 1996, 464–474
https://doi.org/10.1109/ICSE.1996.493441
23 Sousa M J C, Moreira H M. A Survey on the software maintenance process. In: Proceedings of IEEE International Conference on Software Maintenance. 1998, 265–274
https://doi.org/10.1109/ICSM.1998.738518
24 Yip S W L, Lam T. A software maintenance survey. In: Proceedings of the 1st Asia-Pacific Software Engineering Conference. 1994, 70–79
https://doi.org/10.1109/APSEC.1994.465272
25 Abran A, Nguyenkim H. Analysis of maintenance work categories through measurement. In: Proceedings of IEEE Conference on Software Maintenance. 1991, 104–113.
https://doi.org/10.1109/ICSM.1991.160315
26 Gefen D, Schneberger S L. The non-homogeneous maintenance periods: a case study of software modifications. In: Proceedings of IEEE Conference on Software Maintenance. 1996, 134–141
https://doi.org/10.1109/ICSM.1996.564998
27 Meqdadi O, Alhindawi N, Collard M L, Maletic J I. Towards understanding large-scale adaptive changes from version histories. In: Proceedings of the 29th IEEE International Conference on Software Maintenance. 2013, 416–419
https://doi.org/10.1109/ICSM.2013.61
28 Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research. 2003, 3: 993–1022
29 Kim S, Whitehead E J, Zhang Y. Classifying software changes: clean or buggy. IEEE Transactions on Software Engineering, 2008, 34(2): 181–196
https://doi.org/10.1109/TSE.2007.70773
30 Lehnert S, Riebisch M. A taxonomy of change types and its application in software evolution. In: Proceedings of the 19th International Conference and Workshops on Engineering of Computer Based Systems. 2012, 98–107
https://doi.org/10.1109/ECBS.2012.9
31 Chaplin N, Hale J E, Khan K M, Ramil J F, Tan W G. Types of software evolution and software maintenance. Journal of Software Maintenance and Evolution: Research and Practice, 2001, 13(1): 3–30
https://doi.org/10.1002/smr.220
32 Forward A, Lethbridge T C. A taxonomy of software types to facilitate search and evidence-based software engineering. In: Proceedings of the 2008 Conference of the Center for Advanced Studies on Collaborative Research: Meeting of Minds. 2008, 14
https://doi.org/10.1145/1463788.1463807
33 Saini M, Kaur K. Analyzing the change profiles of software systems using their change logs. International Journal of Software Engineering- Egypt, 2014, 7(2): 39–66
34 Larose D T. K-nearest neighbor algorithm. Discovering Knowledge in Data: An Introduction to Data Mining, 2005, 90–106
35 Cohen J. A coefficient of agreement for nominal scales. Educational and Psychological Measurements, 1960, 20(1): 37–46
https://doi.org/10.1177/001316446002000104
36 Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of International Joint Conference on Artificial Intelligence. 1995, 1137–1145
37 Cleveland W S. LOWESS: a program for smoothing scatterplots by robust locally weighted regression. The American Statistician, 1981, 35(1): 54
https://doi.org/10.2307/2683591
38 Massart D L, Smeyers-Verbeke A J, Capron A X, Schlesier K B. Visual presentation of data by means of box plots. LC-GC Europe, 2005, 18(4): 2–5
39 Ramsay J O, Silverman B W. Applied Functional Data Analysis: Methods and Case Studies. New York: Springer-Verlag, 2002
https://doi.org/10.1007/b98886
40 Cuesta-Albertos J A, Gordaliza A, Matrán C. Trimmed k-means: an attempt to robustify quantizers. The Annals of Statistics, 1997, 25(2): 553–576
https://doi.org/10.1214/aos/1031833664
41 Han J, Kamber M. Data Mining: Concepts and Techniques. San Francisco: Morgan Kaufmann, 2000
42 Kothari R, Pitts D. On finding the number of clusters. Pattern Recognition Letters, 1999, 20(4): 405–416
https://doi.org/10.1016/S0167-8655(99)00008-2
43 Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of International Joint Conference on Artificial Intelligence. 1995, 1137–1145
44 Moore D S. Chi-square tests. Purdue University, 1976
45 Bolstad B M, Irizarry R A, Åstrand M, Speed T P.A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics. 2003, 19(2): 185–193
https://doi.org/10.1093/bioinformatics/19.2.185
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed