Improved binary similarity measures for software modularization
Rashid NASEEM, Mustafa BinMat DERIS, Onaiza MAQBOOL, Jing-peng LI, Sara SHAHZAD, Habib SHAH
Improved binary similarity measures for software modularization
Various binary similarity measures have been employed in clustering approaches to make homogeneous groups of similar entities in the data. These similarity measures are mostly based only on the presence or absence of features. Binary similarity measures have also been explored with different clustering approaches (e.g., agglomerative hierarchical clustering) for software modularization to make software systems understandable and manageable. Each similarity measure has its own strengths and weaknesses which improve and deteriorate the clustering results, respectively. We highlight the strengths of some well-known existing binary similarity measures for software modularization. Furthermore, based on these existing similarity measures, we introduce several improved new binary similarity measures. Proofs of the correctness with illustration and a series of experiments are presented to evaluate the effectiveness of our new binary similarity measures.
Binary similarity measure / Binary features / Combination of measures / Software modularization
[1] |
Andreopoulos, B., An , A.J., Tzerpos, V. ,
|
[2] |
Andritsos, P., Tzerpos , V., 2005. Information-theoretic software clustering. IEEE Trans. Softw. Eng., 31(2): 150–165. https://doi.org/10.1109/tse.2005.25
|
[3] |
Anquetil, N., Lethbridge , T.C., 1999. Experiments with clustering as a software remodularization method.Proc. 6th Working Conf. on Reverse Engineering, p.235–255. https://doi.org/10.1109/wcre.1999.806964
|
[4] |
Bauer, M., Trifu, M., 2004. Architecture-aware adaptive clustering of OO systems.Proc. 8th European Conf. on Software Maintenance and Reengineering, p.3–14. https://doi.org/10.1109/csmr.2004.1281401
|
[5] |
Bittencourt, R.A., Guerrero , D.D.S., 2009. Comparison of graph clustering algorithms for recovering software architecture module views.Proc. 13th European Conf. on Software Maintenance and Reengineering, p.251–254.https://doi.org/10.1109/csmr.2009.28
|
[6] |
Cheetham, A.H., Hazel, J.E., 1969. Binary (presenceabsence) similarity coefficents.J. Paleontol., 43(5): 1130–1136.
|
[7] |
Chong, C.Y., Lee, S.P., Ling, T.C. , 2013. Efficient software clustering technique using an adaptive and preventive dendrogram cutting approach.Inform. Softw. Technol., 55(11):1994–2012. https://doi.org/10.1016/j.infsof.2013.07.002
|
[8] |
Cui, J.F., Chae, H.S., 2011. Applying agglomerative hierarchical clustering algorithms to component identification for legacy systems.Inform. Softw. Technol., 53(6): 601–614. https://doi.org/10.1016/j.infsof.2011.01.006
|
[9] |
Davey, J., Burd, E., 2000. Evaluating the suitability of data clustering for software remodularisation.Proc. 7th Working Conf. on Reverse Engineering, p.268–276. https://doi.org/10.1109/wcre.2000.891478
|
[10] |
Dugerdil, P., Jossi, S., 2008. Reverse-architecting legacy software based on roles: an industrial experiment. Commun. Comput. Inform. Sci., 22:114–127. https://doi.org/10.1007/978-3-540-88655-6_9
|
[11] |
Glorie, M., Zaidman , A., van Deursen, A. ,
|
[12] |
Godfrey, M.W., Lee, E.H., 2000. Secrets from the monster: extracting Mozilla’s software architecture.Proc. Int. Symp. on Constructing Software Engineering Tools, p.1–10.
|
[13] |
Hall, M., Walkinshaw , N., McMinn, P. , 2012. Supervised software modularisation.Proc. 28th IEEE Int. Conf. on Software Maintenance, p.472–481. https://doi.org/10.1109/icsm.2012.6405309
|
[14] |
Hussain, I., Khanum, A., Abbasi, A.Q. ,
|
[15] |
Jackson, D.A., Somers, K.M., Harvey, H.H. , 1989. Similarity coefficients: measures of co-occurrence and association or simply measures of occurrence.Am. Nat., 133(3):436–453. https://doi.org/10.1086/284927
|
[16] |
Jahnke, J.H., 2004. Reverse engineering software architecture using rough clusters.Proc. IEEE Annual Meeting of the Fuzzy Information, p.4–9. https://doi.org/10.1109/nafips.2004.1336239
|
[17] |
Kanellopoulos, Y., Antonellis , P., Tjortjis, C. ,
|
[18] |
Lakhotia, A., 1997. A unified framework for expressing software subsystem classification techniques.J. Syst. Softw., 36(3):211–231. https://doi.org/10.1016/0164-1212(95)00098-4
|
[19] |
Lesot, M.J., Rifqi, M., Benhadda, H. , 2009. Similarity measures for binary and numerical data: a survey.Int. J. Knowl. Eng. Soft Data Parad., 1(1):63. https://doi.org/10.1504/ijkesdp.2009.021985
|
[20] |
Lung, C.H., Zaman, M., Nandi, A., 2004. Applications of clustering techniques to software partitioning, recovery and restructuring.J. Syst. Softw. , 73(2):227–244. https://doi.org/10.1016/s0164-1212(03)00234-6
|
[21] |
Lutellier, T., Chollak , D., Garcia, J. ,
|
[22] |
Maqbool, O., Babri, H., 2004. The weighted combined algorithm: a linkage algorithm for software clustering.Proc. 8th European Conf. on Software Maintenance and Reengineering, p.15–24. https://doi.org/10.1109/csmr.2004.1281402
|
[23] |
Maqbool, O., Babri, H., 2007. Hierarchical clustering for software architecture recovery. IEEE Trans. Softw. Eng., 33(11):759–780. https://doi.org/10.1109/tse.2007.70732
|
[24] |
Mitchell, B.S., 2006. Clustering Software Systems to Identify Subsystem Structures.Technical Report, Department of Mathematics and Computer Science, Drexel University, USA.
|
[25] |
Mitchell, B.S., Mancoridis , S., 2006. On the automatic modularization of software systems using the Bunch tool.IEEE Trans. Softw. Eng. , 32(3):193–208. https://doi.org/10.1109/tse.2006.31
|
[26] |
Muhammad, S., Maqbool , O., Abbasi, A.Q. , 2012. Evaluating relationship categories for clustering object-oriented software systems.IET Softw. , 6(3):260–274. https://doi.org/10.1049/iet-sen.2011.0061
|
[27] |
Naseem, R., Maqbool , O., Muhammad, S. , 2010. An improved similarity measure for binary features in software clustering.Proc. 2nd Int. Conf. on Computational Intelligence, Modelling and Simulation, p.111–116. https://doi.org/10.1109/cimsim.2010.34
|
[28] |
Naseem, R., Maqbool , O., Muhammad, S. , 2011. Improved similarity measures for software clustering.Proc. 15th European Conf. on Software Maintenance and Reengineering, p.45–54. https://doi.org/10.1109/csmr.2011.9
|
[29] |
Naseem, R., Maqbool , O., Muhammad, S. , 2013. Cooperative clustering for software modularization.J. Syst. Softw., 86(8):2045–2062. https://doi.org/10.1016/j.jss.2013.03.080
|
[30] |
Patel, C., Hamou-Lhadj , A., Rilling, J. , 2009. Software clustering using dynamic analysis and static dependencies.Proc. 13th European Conf. on Software Maintenance and Reengineering, p.27–36. https://doi.org/10.1109/csmr.2009.62
|
[31] |
Praditwong, K., 2011. Solving software module clustering problem by evolutionary algorithms.Proc. 8th Int. Joint Conf. on Computer Science and Software Engineering, p.154–159. https://doi.org/10.1109/jcsse.2011.5930112
|
[32] |
Praditwong, K., Harman, M., Yao, X., 2011. Software module clustering as a multi-objective search problem.IEEE Trans. Softw. Eng., 37(2):264–282. https://doi.org/10.1109/tse.2010.26
|
[33] |
Saeed, M., Maqbool , O., Babri, H. ,
|
[34] |
Sartipi, K., Kontogiannis , K., 2003. On modeling software architecture recovery as graph matching.Proc. Int. Conf. on Software Maintenance, p.224–234. https://doi.org/10.1109/icsm.2003.1235425
|
[35] |
Seung-Seok, C., Cha, S.H., Tappert, C.C. , 2010. A survey of binary similarity and distance measures. J. Syst. Cybern. Inform., 8(1):43–48.
|
[36] |
Shah, Z., Naseem, R., Orgun, M.,
|
[37] |
Shtern, M., Tzerpos , V., 2010. On the comparability of software clustering algorithms.Proc. IEEE 18th Int. Conf. on Program Comprehension, p.64–67. https://doi.org/10.1109/icpc.2010.25
|
[38] |
Shtern, M., Tzerpos , V., 2012. Clustering methodologies for software engineering. Adv. Softw. Eng., 2012: 792024.1–792024.18. https://doi.org/10.1155/2012/792024
|
[39] |
Shtern, M., Tzerpos , V., 2014. Methods for selecting and improving software clustering algorithms.Softw. Pract. Exp., 44(1):33–46. https://doi.org/10.1002/spe.2147
|
[40] |
Siddique, F., Maqbool , O., 2012. Enhancing comprehensibility of software clustering results.IET Softw., 6(4):283.https://doi.org/10.1049/iet-sen.2012.0027
|
[41] |
Synytskyy, N., Holt, R.C., Davis, I. , 2005. Browsing software architectures with LSEdit.Proc. 13th Int. Workshop on Program Comprehension, p.176–178. https://doi.org/10.1109/wpc.2005.11
|
[42] |
Tonella, P., 2001. Concept analysis for module restructuring.EEE Trans. Softw. Eng. I, 27(4):351–363. https://doi.org/10.1109/32.917524
|
[43] |
Tzerpos, V., Holt, R.C., 1999. MoJo: a distance metric for software clusterings.Proc. 6th Working Conf. on Reverse Engineering, p.187–193. https://doi.org/10.1109/wcre.1999.806959
|
[44] |
Tzerpos, V., Holt, R.C., 2000. On the stability of software clustering algorithms.Proc. 8th Int. Workshop on Program Comprehension, p.211–218. https://doi.org/10.1109/wpc.2000.852495
|
[45] |
Vasconcelos, A., Werner , C., 2007. Architecture recovery and evaluation aiming at program understanding and reuse.Proc. Int. Conf. on the Quality of Software Architectures, p.72–89. https://doi.org/10.1007/978-3-540-77619-2_5
|
[46] |
Veal, B.W.G., 2011. Binary Similarity Measures and Their Applications in Machine Learning.PhD Thesis, London School of Economics, London, UK.
|
[47] |
Wang, Y., Liu, P., Guo, H.,
|
[48] |
Wen, Z., Tzerpos , V., 2003. An optimal algorithm for MoJo distance.Proc. 11th IEEE Int. Workshop on Program Comprehension, p.227–235. https://doi.org/10.1109/wpc.2003.1199206
|
[49] |
Wen, Z., Tzerpos , V., 2004. An effectiveness measure for software clustering algorithms.Proc. 12th IEEE Int. Workshop on Program Comprehension, p.194–203. https://doi.org/10.1109/wpc.2004.1311061
|
[50] |
Wiggerts, T.A., 1997. Using clustering algorithms in legacy systems remodularization.Proc. 4th Working Conf. on Reverse Engineering, p.33–43. https://doi.org/10.1109/wcre.1997.624574
|
[51] |
Wu, J., Hassan, A.E., Holt, R.C. , 2005. Comparison of clustering algorithms in the context of software evolution.Proc. 21st IEEE Int. Conf. on Software Maintenance, p.525–535. https://doi.org/10.1109/icsm.2005.31
|
[52] |
Xanthos, S., Goodwin , N., 2006. Clustering object-oriented software systems using spectral graph partitioning.Urbana , 51(1):1–5.
|
[53] |
Xia,C., Tzerpos , V., 2005. Software clustering based on dynamic dependencies.Proc. 9th European Conf. on Software Maintenance and Reengineering, p.124–133. https://doi.org/10.1109/csmr.2005.49
|
/
〈 | 〉 |