Many recently proposed subspace clustering methods suffer from two severe problems. First, the algorithms typically scale exponentially with the data dimensionality or the subspace dimensionality of clusters. Second, the clustering results are often sensitive to input parameters. In this paper, a fast algorithm of subspace clustering using attribute clustering is proposed to overcome these limitations. This algorithm first filters out redundant attributes by computing the Gini coefficient. To evaluate the correlation of every two non-redundant attributes, the relation matrix of non-redundant attributes is constructed based on the relation function of two dimensional united Gini coefficients. After applying an overlapping clustering algorithm on the relation matrix, the candidate of all interesting subspaces is achieved. Finally, all subspace clusters can be derived by clustering on interesting subspaces. Experiments on both synthesis and real datasets show that the new algorithm not only achieves a significant gain of runtime and quality to find subspace clusters, but also is insensitive to input parameters.
NIU Kun, CHEN Junliang, ZHANG Shubo
. Subspace clustering through attribute clustering[J]. Frontiers of Electrical and Electronic Engineering, 2008
, 3(1)
: 44
-48
.
DOI: 10.1007/s11460-008-0010-x
1. Agrawal R Gehrke J Gunopulos D et al.Automatic subspace clustering of high dimensionaldata for data mining applicationsProceedings of ACM SIGMOD International Conference on Managementof DataWashingtonACM Press 1998 94105
2. Agrawal R Gehrke J Gunopulos D et al.Automatic subspace clustering of high dimensionaldataData Mining and Knowledge Discovery 2005 11(1)533
3. Cheng C H Fu A W Zhang Y Entropy-based subspace clustering for mining numericaldataProceedings of the 5th ACM SIGKDD InternationalConference on Knowledge Discovery and Data MiningUSAACM Press 1999 8493
4. Goil S Nagesh H S Choudhary A MAFIA: efficient and scalable subspace clustering for verylarge data sets. Technique Report No. CPDC-TR-9906-010. Center forParallel and Distributed Computing, Dept. of Electrical and ComputerEngineeringNorthwestern UniversityEvanston, IL1999
5. Procopiuc C M Johes M Agarwal P K et al.A Monte Carlo algorithm for fast projective clusteringProceedings of ACM SIGMODInternational Conference on Management of DataMadisonACM Press 2002 418427
6. Huang Z Ng M Rong H Automated variable weighting in k-means type clusteringIEEE Transactions on Pattern Analysis and MachineIntelligence 2005 27(5)657668
7. Kriegel H Kröger P Renz M et al.A generic framework for efficient subspace clusteringof high-dimensional dataProceedings of5th IEEE International Conference on Data MiningNew OrleansIEEE Press250257