Improved binary similarity measures for software modularization

Rashid NASEEM, Mustafa BinMat DERIS, Onaiza MAQBOOL, Jing-peng LI, Sara SHAHZAD, Habib SHAH

PDF(1108 KB)
PDF(1108 KB)
Front. Inform. Technol. Electron. Eng ›› 2017, Vol. 18 ›› Issue (8) : 1082-1107. DOI: 10.1631/FITEE.1500373
Article
Article

Improved binary similarity measures for software modularization

Author information +
History +

Abstract

Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.

Keywords

Binary similarity measure / Binary features / Combination of measures / Software modularization

Cite this article

Download citation ▾
Rashid NASEEM, Mustafa BinMat DERIS, Onaiza MAQBOOL, Jing-peng LI, Sara SHAHZAD, Habib SHAH. Improved binary similarity measures for software modularization. Front. Inform. Technol. Electron. Eng, 2017, 18(8): 1082‒1107 https://doi.org/10.1631/FITEE.1500373

References

[1]
Andreopoulos, B., An , A.J., Tzerpos, V. , , 2005. Multiple layer clustering of large software systems.Proc. 12th Working Conf. on Reverse Engineering, p.79–88. https://doi.org/10.1109/wcre.2005.24
[2]
Andritsos, P., Tzerpos , V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150–165. https://doi.org/10.1109/tse.2005.25
[3]
Anquetil, N., Lethbridge , T.C., 1999. Experiments with clustering as a software remodularization method.Proc. 6th Working Conf. on Reverse Engineering, p.235–255. https://doi.org/10.1109/wcre.1999.806964
[4]
Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems.Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3–14. https://doi.org/10.1109/csmr.2004.1281401
[5]
Bittencourt, R.A., Guerrero , D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views.Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251–254.https://doi.org/10.1109/csmr.2009.28
[6]
Cheetham, A.H., Hazel, J.E., 1969. Binary (presenceabsence) similarity coefficents.J. Paleontol., 43(5): 1130–1136.
[7]
Chong, C.Y., Lee, S.P., Ling, T.C. , 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach.Inform. Softw. Technol., 55(11):1994–2012. https://doi.org/10.1016/j.infsof.2013.07.002
[8]
Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems.Inform. Softw. Technol., 53(6): 601–614. https://doi.org/10.1016/j.infsof.2011.01.006
[9]
Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation.Proc. 7th Working Conf. on Reverse Engineering, p.268–276. https://doi.org/10.1109/wcre.2000.891478
[10]
Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22:114–127. https://doi.org/10.1007/978-3-540-88655-6_9
[11]
Glorie, M., Zaidman , A., van Deursen, A. , , 2009. Splitting a large software repository for easing future software evolution—an industrial experience report.J. Softw. Mainten. Evol. Res. Pract., 21(2):113–141. https://doi.org/10.1002/smr.401
[12]
Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture.Proc. Int. Symp. on Constructing Software Engineering Tools, p.1–10.
[13]
Hall, M., Walkinshaw , N., McMinn, P. , 2012. Supervised software modularisation.Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472–481. https://doi.org/10.1109/icsm.2012.6405309
[14]
Hussain, I., Khanum, A., Abbasi, A.Q. , , 2015. A novel approach for software architecture recovery using particle swarm optimization.Int. Arab. J. Inform. Technol., 12(1):1–10.
[15]
Jackson, D.A., Somers, K.M., Harvey, H.H. , 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence.Am. Nat., 133(3):436–453. https://doi.org/10.1086/284927
[16]
Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters.Proc. IEEE Annual Meeting of the Fuzzy Information, p.4–9. https://doi.org/10.1109/nafips.2004.1336239
[17]
Kanellopoulos, Y., Antonellis , P., Tjortjis, C. , , 2007. K-attractors: a clustering algorithm for software measurement data analysis.Proc. 19th IEEE Int. Conf. on Tools with Artificial Intelligence, p.358–365. https://doi.org/10.1109/ictai.2007.31
[18]
Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques.J. Syst. Softw., 36(3):211–231. https://doi.org/10.1016/0164-1212(95)00098-4
[19]
Lesot, M.J., Rifqi, M., Benhadda, H. , 2009. Similarity measures for binary and numerical data: a survey.Int. J. Knowl. Eng. Soft Data Parad., 1(1):63. https://doi.org/10.1504/ijkesdp.2009.021985
[20]
Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring.J. Syst. Softw. , 73(2):227–244. https://doi.org/10.1016/s0164-1212(03)00234-6
[21]
Lutellier, T., Chollak , D., Garcia, J. , , 2015. Comparing software architecture recovery techniques using accurate dependencies.Proc. 37th IEEE Int. Conf. on Software Engineering, p.69–78. https://doi.org/10.1109/icse.2015.136
[22]
Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering.Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15–24. https://doi.org/10.1109/csmr.2004.1281402
[23]
Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11):759–780. https://doi.org/10.1109/tse.2007.70732
[24]
Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures.Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.
[25]
Mitchell, B.S., Mancoridis , S., 2006. On the automatic modularization of software systems using the Bunch tool.IEEE Trans. Softw. Eng. , 32(3):193–208. https://doi.org/10.1109/tse.2006.31
[26]
Muhammad, S., Maqbool , O., Abbasi, A.Q. , 2012. Evaluating relationship categories for clustering object-oriented software systems.IET Softw. , 6(3):260–274. https://doi.org/10.1049/iet-sen.2011.0061
[27]
Naseem, R., Maqbool , O., Muhammad, S. , 2010. An improved similarity measure for binary features in software clustering.Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111–116. https://doi.org/10.1109/cimsim.2010.34
[28]
Naseem, R., Maqbool , O., Muhammad, S. , 2011. Improved similarity measures for software clustering.Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45–54. https://doi.org/10.1109/csmr.2011.9
[29]
Naseem, R., Maqbool , O., Muhammad, S. , 2013. Cooperative clustering for software modularization.J. Syst. Softw., 86(8):2045–2062. https://doi.org/10.1016/j.jss.2013.03.080
[30]
Patel, C., Hamou-Lhadj , A., Rilling, J. , 2009. Software clustering using dynamic analysis and static dependencies.Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27–36. https://doi.org/10.1109/csmr.2009.62
[31]
Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms.Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154–159. https://doi.org/10.1109/jcsse.2011.5930112
[32]
Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem.IEEE Trans. Softw. Eng., 37(2):264–282. https://doi.org/10.1109/tse.2010.26
[33]
Saeed, M., Maqbool , O., Babri, H. , , 2003. Software clustering techniques and the use of combined algorithm. Proc. 7th European Conf. on Software Maintenance and Reengineering, p.301–306. https://doi.org/10.1109/csmr.2003.1192438
[34]
Sartipi, K., Kontogiannis , K., 2003. On modeling software architecture recovery as graph matching.Proc. Int. Conf. on Software Maintenance, p.224–234. https://doi.org/10.1109/icsm.2003.1235425
[35]
Seung-Seok, C., Cha, S.H., Tappert, C.C. , 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1):43–48.
[36]
Shah, Z., Naseem, R., Orgun, M., , 2013. Software clustering using automated feature subset selection.Proc. Int. Conf. on Advanced Data Mining and Applications, p.47–58. https://doi.org/10.1007/978-3-642-53917-6_5
[37]
Shtern, M., Tzerpos , V., 2010. On the comparability of software clustering algorithms.Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64–67. https://doi.org/10.1109/icpc.2010.25
[38]
Shtern, M., Tzerpos , V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1–792024.18. https://doi.org/10.1155/2012/792024
[39]
Shtern, M., Tzerpos , V., 2014. Methods for selecting and improving software clustering algorithms.Softw. Pract. Exp., 44(1):33–46. https://doi.org/10.1002/spe.2147
[40]
Siddique, F., Maqbool , O., 2012. Enhancing comprehensibility of software clustering results.IET Softw., 6(4):283.https://doi.org/10.1049/iet-sen.2012.0027
[41]
Synytskyy, N., Holt, R.C., Davis, I. , 2005. Browsing software architectures with LSEdit.Proc. 13th Int. Workshop on Program Comprehension, p.176–178. https://doi.org/10.1109/wpc.2005.11
[42]
Tonella, P., 2001. Concept analysis for module restructuring.EEE Trans. Softw. Eng. I, 27(4):351–363. https://doi.org/10.1109/32.917524
[43]
Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings.Proc. 6th Working Conf. on Reverse Engineering, p.187–193. https://doi.org/10.1109/wcre.1999.806959
[44]
Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms.Proc. 8th Int. Workshop on Program Comprehension, p.211–218. https://doi.org/10.1109/wpc.2000.852495
[45]
Vasconcelos, A., Werner , C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse.Proc. Int. Conf. on the Quality of Software Architectures, p.72–89. https://doi.org/10.1007/978-3-540-77619-2_5
[46]
Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning.PhD Thesis, London School of Economics, London, UK.
[47]
Wang, Y., Liu, P., Guo, H., , 2010. Improved hierarchical clustering algorithm for software architecture recovery. Proc. Int. Conf. on Intelligent Computing and Cognitive Informatics, p.247–250.https://doi.org/10.1109/icicci.2010.45
[48]
Wen, Z., Tzerpos , V., 2003. An optimal algorithm for MoJo distance.Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227–235. https://doi.org/10.1109/wpc.2003.1199206
[49]
Wen, Z., Tzerpos , V., 2004. An effectiveness measure for software clustering algorithms.Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194–203. https://doi.org/10.1109/wpc.2004.1311061
[50]
Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization.Proc. 4th Working Conf. on Reverse Engineering, p.33–43. https://doi.org/10.1109/wcre.1997.624574
[51]
Wu, J., Hassan, A.E., Holt, R.C. , 2005. Comparison of clustering algorithms in the context of software evolution.Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525–535. https://doi.org/10.1109/icsm.2005.31
[52]
Xanthos, S., Goodwin , N., 2006. Clustering object-oriented software systems using spectral graph partitioning.Urbana , 51(1):1–5.
[53]
Xia,C., Tzerpos , V., 2005. Software clustering based on dynamic dependencies.Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124–133. https://doi.org/10.1109/csmr.2005.49

RIGHTS & PERMISSIONS

2017 Zhejiang University and Springer-Verlag GmbH Germany
PDF(1108 KB)

Accesses

Citations

Detail

Sections
Recommended

/