Efficient subspace clustering for higher dimensional data using fuzzy entropy
C. Palanisamy , S. Selvan
Journal of Systems Science and Systems Engineering ›› 2009, Vol. 18 ›› Issue (1) : 95 -110.
Efficient subspace clustering for higher dimensional data using fuzzy entropy
In this paper we propose a novel method for identifying relevant subspaces using fuzzy entropy and perform clustering. This measure discriminates the real distribution better by using membership functions for measuring class match degrees. Hence the fuzzy entropy reflects more information in the actual distribution of patterns in the subspaces. We use a heuristic procedure based on the silhouette criterion to find the number of clusters. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. Empirical results have shown its favorable performance in comparison with several other clustering algorithms.
Clustering / entropy / fuzzy entropy / class match degree / subspace
| [1] |
Agrawal, R., Johannes, G., Dimitrios, G. & Prabhakar, R. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, 94–105 |
| [2] |
|
| [3] |
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. & Park J.S. (1999). Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Baumgartner, 61–72 |
| [4] |
Blake. (2005). UCI Learning Repository. Available via DIALOG. http://www.ics.uci.edu/mlearn/MLsummary |
| [5] |
Chang, J.W. & Jin, D.S. (2002). A new cell-based clustering method for large, high dimensional data in data mining applications. In: Proceedings of the 2002 ACM Symposium on Applied Computing, 503–507 |
| [6] |
Cheng, C.H., Fu, A.W. & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 84–93 |
| [7] |
Arthur, D. & Vassilvitskii, S. (2006). How slow is the k-means Method? In: Proceedings of the 2006 Symposium on Computational Geometry (SoCG) |
| [8] |
Friedman, J.H. & Meulman, J.J. (2002). Clustering objects on subsets of attributes. Available via DIALOG. http://citeseer.nj.nec.com/friedman02clustering.html |
| [9] |
|
| [10] |
Kailing, K., Kriegel, H.P., Kroeger, P. & Wanka, S. (2003). Ranking interesting subspaces for clustering high dimensional data. In: Proc. of 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, 241–252 |
| [11] |
|
| [12] |
Kaufman, L. & Rousseau, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley |
| [13] |
Kosko, B. (1990). Fuzzy systems as universal approximators. In: Proc. of IEEE International Conf. Fuzzy Systems, 1153–1162 |
| [14] |
|
| [15] |
Liu, B., Xia, Y. & Yu, P.S. (2000). Clustering through decision tree construction. In: Proceedings of the Ninth International Conf. on Inform. and Knowl. Management, 20–29 |
| [16] |
|
| [17] |
|
| [18] |
Procopiuc, C.M., Jones, M., Agarwal, P.K. & Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 418–427 |
| [19] |
|
| [20] |
Goil, S., Nagesh, H. & Choudhary, A. (1999). Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC TR-9906-010, Northwestern University |
| [21] |
|
| [22] |
Xiong, H., Wu, J. & Chen, J. (2006). K-means clustering versus validation measures: a data distribution perspective. In: Proc. of the 12th ACM SIGKDD |
| [23] |
Yang, J., Wang, W., Wang, H. & Yu, P. (2002). δ-clusters: capturing subspace correlation in a large data set. In: 18th International Conference on Data Engineering, 517–528 |
/
| 〈 |
|
〉 |