1.Department of Computer Science, Xiamen University; 2.Software School, Xiamen University
Show less
History+
Published
05 Mar 2008
Issue Date
05 Mar 2008
Abstract
Clustering high dimensional data has become a challenge in data mining due to the curse of dimensionality. To solve this problem, subspace clustering has been defined as an extension of traditional clustering that seeks to find clusters in subspaces spanned by different combinations of dimensions within a dataset. This paper presents a new subspace clustering algorithm that calculates the local feature weights automatically in an EM-based clustering process. In the algorithm, the features are locally weighted by using a new unsupervised weighting method, as a means to minimize a proposed clustering criterion that takes into account both the average intra-clusters compactness and the average inter-clusters separation for subspace clustering. For the purposes of capturing accurate subspace information, an additional outlier detection process is presented to identify the possible local outliers of subspace clusters, and is embedded between the E-step and M-step of the algorithm. The method has been evaluated in clustering real-world gene expression data and high dimensional artificial data with outliers, and the experimental results have shown its effectiveness.
CHEN Lifei, JIANG Qingshan.
An extended EM algorithm for subspace clustering. Front. Comput. Sci., 2008, 2(1): 81‒86 https://doi.org/10.1007/s11704-008-0007-x
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
This is a preview of subscription content, contact us for subscripton.
References
1. Berkhin P A surveyof clustering data mining techniquesKoganJNicholas CTeboulle MGrouping multidimensionaldata: recent advances in clusteringBerlinSpringer 2006 2571 2. Parsons L Haque E Liu H Subspace clustering for high dimensional data: a reviewACM SIGKDD Explorations Newsletter 2004 6(1)90105 3. Hinneburg A Aggarwal C C Kaim D What is the nearest neighbor in high dimensional spacesProceedings of VLDBBerlinSpringer 2000 506515 4. Dash M Liu M Yao J Dimensionality reduction for unsupervised dataProceedings of ICTAINewport BeachIEEE Computer Society 1997 532539 5. Han E-H Karypis G Clustering in a high-dimensionalspace using hypergraph modelsTechnicalReport, TR-97-063, Universyty of Minnesota 1997 6. Aggarwal C C Procopiuc C Wolf J L et al.Fast algorithm for projected clusteringProceedings of ACM SIGMODNew YorkACM 1999 6172 7. Agrawal R Gehrke J Gunopulos D et al.Automatic subspace clustering of high dimensionaldata for data mining applicationsProceedingsof ACM SIGMODNew YorkACM 1998 94105 8. Cheng C H Fu A W Zhang Y Entropy-based subspace clustering for mining numericaldataProceedings of ACM SIGKDDNew YorkACM 1999 8493 9. Goil S Nagesh H Choudhary A Mafia: efficient and scalable subspace clustering for verylarge data setsTechnical Report CPDC-TR-9906-010,Northwestern University 1999 10. Domeniconi C Gunopulos D Ma S et al.Locally adaptive metrics for clustering high dimensionaldataTechnical Report ISE-TR-06-04 2006 11. Jing L Ng M K Xu J et al.On the performance of feature weighting K-means for text subspace clusteringProceedings of WAIM 2005 205212 12. Wu C F J On the convergence properties of the EM algorithmAnnals of Statistics 1983 11(1)95103 13. Friedman J H Meulman J J Clustering objects on subsetsof attributesJournal of the Royal StatisticalSociety: Series B (Statistical Methodology) 2004 66(4)815849 14. Candillier L Tellier I Torre F et al.SuSE: subspace selection embedded in an EM algorithmProceedings of CAP 2006 331345 15. Chen L F Jiang Q S Wang S R A new unsupervised term weighting scheme for document clusteringJournal of Computational Information Systems 2007 3(4)14551464 16. Aggarwal C C Yu P S Outlier detection for highdimensional dataProceedings of ACM SIGMODNew YorkACM 2001 219234 17. Gan G Wu J Yang Z A fuzzy subspace algorithm for clustering high dimensionaldataLNAI 2006 4093271278 18. Sun H Wang S Jiang Q FCM-based model selection algorithms for determining thenumber of clustersPattern Recognition 2004 37(10)20272037 19. Golub T R Slonim D K Tamayo P et al.Molecular classification of cancer: class discoveryand class prediction by gene expression monitoringScience 1999 286531537 20. Gordon G J Jensen R V Hsiao L L et al.Translation of microarray data into clinically relevantcancer diagnostic tests using gege expression ratios in lung cancerand mesotheliomaCancer Research 2002 6249634967 21. Tan S Cheng X Ghanem M M et al.A novel refinement approach for text categorizationProceedings of ACM CIKMNew YorkACM 2005 469476
AI Summary 中Eng×
Note: Please note that the content below is AI-generated. Frontiers Journals website shall not be held liable for any consequences associated with the use of this content.