Linear manifold clustering for high dimensional data based on line manifold searching and fusing

Gang-guo Li , Zheng-zhi Wang , Xiao-min Wang , Qing-shan Ni , Bo Qiang

Journal of Central South University ›› 2010, Vol. 17 ›› Issue (5) : 1058 -1069.

PDF
Journal of Central South University ›› 2010, Vol. 17 ›› Issue (5) : 1058 -1069. DOI: 10.1007/s11771-010-0598-x
Article

Linear manifold clustering for high dimensional data based on line manifold searching and fusing

Author information +
History +
PDF

Abstract

High dimensional data clustering, with the inherent sparsity of data and the existence of noise, is a serious challenge for clustering algorithms. A new linear manifold clustering method was proposed to address this problem. The basic idea was to search the line manifold clusters hidden in datasets, and then fuse some of the line manifold clusters to construct higher dimensional manifold clusters. The orthogonal distance and the tangent distance were considered together as the linear manifold distance metrics. Spatial neighbor information was fully utilized to construct the original line manifold and optimize line manifolds during the line manifold cluster searching procedure. The results obtained from experiments over real and synthetic data sets demonstrate the superiority of the proposed method over some competing clustering methods in terms of accuracy and computation time. The proposed method is able to obtain high clustering accuracy for various data sets with different sizes, manifold dimensions and noise ratios, which confirms the anti-noise capability and high clustering accuracy of the proposed method for high dimensional data.

Keywords

linear manifold / subspace clustering / line manifold / data mining / data fusing / clustering algorithm

Cite this article

Download citation ▾
Gang-guo Li, Zheng-zhi Wang, Xiao-min Wang, Qing-shan Ni, Bo Qiang. Linear manifold clustering for high dimensional data based on line manifold searching and fusing. Journal of Central South University, 2010, 17(5): 1058-1069 DOI:10.1007/s11771-010-0598-x

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

AgrawalR., GehrkeJ., GunopulosD., RaghavanP.. Automatic subspace clustering of high dimensional data [J]. Data Mining and Knowledge Discovery, 2005, 11(1): 5-33

[2]

WittenD. M., TibshiraniR.. A framework for feature selection in clustering [J]. J Am Stat Assoc, 2010, 105(490): 713-726

[3]

ZhengF., ShenX., FuZ., ZhengS., LiG.. Feature selection for genomic data sets through feature clustering [J]. Int J Data Min Bioinform, 2010, 4(2): 228-240

[4]

LiuH., YuL.. Toward integrating feature selection algorithms for classification and clustering [J]. IEEE Trans Knowledge and Data Eng, 2005, 17(3): 1-12

[5]

HaqueP. E., LiuH.. Subspace clustering for high dimensional data: A review [J]. ACM SIGKDD Explorations Newsletter, 2004, 6(1): 90-105

[6]

HirschM., SwiftS., LiuX.. Optimal search space for clustering gene expression data via consensus [J]. J Comput Biol, 2007, 14(10): 1327-1341

[7]

AgrawalR., GehrkeJ., GunopulosD., RaghavanP.. Automatic subspace clustering of high dimensional data for data mining applications [C]. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, New York, ACM Press: 94-105

[8]

PhamD. T., AfifyA. A.. Clustering techniques and their applications in engineering [C]. Proceedings of the Institution of Mechanical Engineers, 2007, Washington, Professional Engineering Publishing: 1445-1459

[9]

KailingK., KriegelH. P., KrogerP.. Density-connected subspace clustering for high-dimensional data [C]. Proc Fourth SIAM Int’l Conf Data Mining, 2004, German, Lake Buena Vista FL: 246-257

[10]

AggarwalC. C., WolfJ. L., YuP. S., ProcopiucC., ParkJ. K.. Fast algorithms for projected clustering [C]. Proceedings of the 1999 ACM SIGMOD International Conference on Management of Data, 1999, New York, ACM Press: 61-72

[11]

AggarwalC. C., YuP. S.. Finding generalized projected clusters in high dimensional spaces [C]. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 2000, New York, ACM Press: 70-81

[12]

ProcopiucC. M., JonesM., AgarwalP. K., MuraliT. M.. Monte Carlo algorithm for fast projective clustering [C]. Proceedings ACM SIGMOD, 2002, New York, ACM Press: 418-427

[13]

LungM., MamoulisN.. Iterative projected clustering by subspace mining [J]. IEEE Trans Knowledge and Data Eng, 2005, 17(2): 176-189

[14]

ChengY., ChurchG. M.. Biclustering of expression data [C]. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology, 2000, La Jolla, California, AAAI Press: 93-103

[15]

YangJ., WangW., WangH., YuP.. δ-clusters: Capturing subspace correlation in a large data set [C]. Proceedings of the 18th International Conference on Data Engineering, 2002, San Jose, CA, ICDE Press: 517-528

[16]

HarpazR., HaralickR.. Exploiting the geometry of gene expression patterns for unsupervised learning [C]. Proceedings of the 18th International Conference on Pattern Recognition, 2006, Hong Kong, IEEE Computer Society Press: 670-674

[17]

HaralickR., HarpazR.. Linear manifold clustering in high dimensional spaces by stochastic search [J]. Pattern Recognition, 2007, 40(10): 2672-2684

[18]

DengH., WuY.-h., DuanJ.-an.. Adaptive learning with guaranteed stability for discrete-time recurrent neural networks [J]. Journal of Central South University of Technology, 2007, 14(3): 685-690

[19]

ZhouX.-c., ShenQ.-t., LiuL.-mei.. New two-dimensional fuzzy C-means clustering algorithm for image segmentation [J]. Journal of Central South University of Technology, 2008, 15(6): 882-887

[20]

KittlerJ., IllingworthJ.. Minimum error thresholding [J]. Pattern Recognition, 1986, 19: 41-47

[21]

AeberhardS., CoomansD., VelO.The classification performance of RDA [R], 1992, North Queensland, James Cook University of North Queensland: 92-101

[22]

ShapiraM., SegalE., BotsteinD.. Disruption of yeast forkhead-associated cell cycle transcription by oxidative stress [J]. Mol Biol Cell, 2004, 15(12): 5659-5669

[23]

TrondB., BjarteD., IngeJ.. LSimpute: Accurate estimation of missing values in microarray data with least squares methods [J]. Nucleic Acids Research, 2004, 32(3): e34

AI Summary AI Mindmap
PDF

110

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/