Efficient subspace clustering for higher dimensional data using fuzzy entropy

C. Palanisamy; S. Selvan

doi:10.1007/s11518-009-5097-y

Journal of Systems Science and Systems Engineering ›› 2009, Vol. 18 ›› Issue (1) :95 -110. DOI: 10.1007/s11518-009-5097-y

Article

Efficient subspace clustering for higher dimensional data using fuzzy entropy

C. Palanisamy ¹^,^a
, S. Selvan ²

Author information +

History +

PDF

Abstract

In this paper we propose a novel method for identifying relevant subspaces using fuzzy entropy and perform clustering. This measure discriminates the real distribution better by using membership functions for measuring class match degrees. Hence the fuzzy entropy reflects more information in the actual distribution of patterns in the subspaces. We use a heuristic procedure based on the silhouette criterion to find the number of clusters. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets. Empirical results have shown its favorable performance in comparison with several other clustering algorithms.

Keywords

Clustering / entropy / fuzzy entropy / class match degree / subspace

Cite this article

Download citation ▾

C. Palanisamy, S. Selvan. Efficient subspace clustering for higher dimensional data using fuzzy entropy. Journal of Systems Science and Systems Engineering, 2009, 18(1): 95-110 DOI:10.1007/s11518-009-5097-y

登录浏览全文

4963

注册一个新账户忘记密码

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Agrawal, R., Johannes, G., Dimitrios, G. & Prabhakar, R. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proc. of ACM SIGMOD International Conference on Management of Data, ACM Press, 94–105

[2]	Aggarwal C.C., Yu P.. Redefining clustering for high-dimensional applications. IEEE Trans. Knowledge and Data Eng., 2002, 14(2): 210-225.

[3]	Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C. & Park J.S. (1999). Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, Baumgartner, 61–72

[4]	Blake. (2005). UCI Learning Repository. Available via DIALOG. http://www.ics.uci.edu/mlearn/MLsummary

[5]	Chang, J.W. & Jin, D.S. (2002). A new cell-based clustering method for large, high dimensional data in data mining applications. In: Proceedings of the 2002 ACM Symposium on Applied Computing, 503–507

[6]	Cheng, C.H., Fu, A.W. & Zhang, Y. (1999). Entropy-based subspace clustering for mining numerical data. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 84–93

[7]	Arthur, D. & Vassilvitskii, S. (2006). How slow is the k-means Method? In: Proceedings of the 2006 Symposium on Computational Geometry (SoCG)

[8]	Friedman, J.H. & Meulman, J.J. (2002). Clustering objects on subsets of attributes. Available via DIALOG. http://citeseer.nj.nec.com/friedman02clustering.html

[9]	Hruschka E.R., Covoes T.F.. Feature selection for cluster analysis: an approach based on the simplified Silhouette criterion. Computational Intelligence for Modelling, Control and Automation, 2005, 1(28–30): 32-38.

[10]	Kailing, K., Kriegel, H.P., Kroeger, P. & Wanka, S. (2003). Ranking interesting subspaces for clustering high dimensional data. In: Proc. of 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases, 241–252

[11]	Kanungo T., Mount D.M., Netanyahu N., Piatko C., Silverman R., Wu A.Y.. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Trans: Pattern Analysis and Machine Intelligence, 2002, 24: 881-892.

[12]	Kaufman, L. & Rousseau, P.J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis, Wiley

[13]	Kosko, B. (1990). Fuzzy systems as universal approximators. In: Proc. of IEEE International Conf. Fuzzy Systems, 1153–1162

[14]	Lee C.C.. Fuzzy logic in control systems: fuzzy logic controller, parts i and ii. IEEE Transactions on Systems, Man, and Cybernetics, 1992, 20(2): 404-435.

[15]	Liu, B., Xia, Y. & Yu, P.S. (2000). Clustering through decision tree construction. In: Proceedings of the Ninth International Conf. on Inform. and Knowl. Management, 20–29

[16]	Martinez A.M., Kak A.C.. PCA versus LDA. IEEE Trans: Pattern Analysis and Machine Intelligence, 2001, 23(2): 228-233.

[17]	Parson L., Haque E., Liu H.. Subspace clustering for high dimensional data: a review. SIGKDD Explorations, 2004, 6(1): 90-105.

[18]	Procopiuc, C.M., Jones, M., Agarwal, P.K. & Murali, T.M. (2002). A Monte Carlo algorithm for fast projective clustering. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 418–427

[19]	Plant C., Railing C., Kriegel K., Kroger P.. Subspace selection for clustering high-dimensional data. Fourth IEEE International Conference on Data Mining (ICDM’04), 2004, 1(4): 11-18.

[20]	Goil, S., Nagesh, H. & Choudhary, A. (1999). Mafia: efficient and scalable subspace clustering for very large data sets. Technical Report CPDC TR-9906-010, Northwestern University

[21]	Woo K.G., Lee J.H.. FINDIT: a Fast and Intelligent Subspace Clustering Algorithm using Dimension Voting. PhD thesis, 2002, Taejon, Korea: Korea Advanced Institute of Science and Technology

[22]	Xiong, H., Wu, J. & Chen, J. (2006). K-means clustering versus validation measures: a data distribution perspective. In: Proc. of the 12th ACM SIGKDD

[23]	Yang, J., Wang, W., Wang, H. & Yu, P. (2002). δ-clusters: capturing subspace correlation in a large data set. In: 18th International Conference on Data Engineering, 517–528