Clustering method based on data division and partition

Zhi-mao Lu , Chen Liu , S. Massinanke , Chun-xiang Zhang , Lei Wang

Journal of Central South University ›› 2014, Vol. 21 ›› Issue (1) : 213 -222.

PDF
Journal of Central South University ›› 2014, Vol. 21 ›› Issue (1) : 213 -222. DOI: 10.1007/s11771-014-1932-5
Article

Clustering method based on data division and partition

Author information +
History +
PDF

Abstract

Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets (VLDS). In this work, a novel division and partition clustering method (DP) was proposed to solve the problem. DP cut the source data set into data blocks, and extracted the eigenvector for each data block to form the local feature set. The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector. Ultimately according to the global eigenvector, the data set was assigned by criterion of minimum distance. The experimental results show that it is more robust than the conventional clusterings. Characteristics of not sensitive to data dimensions, distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.

Keywords

clustering / division / partition / very large data sets (VLDS)

Cite this article

Download citation ▾
Zhi-mao Lu, Chen Liu, S. Massinanke, Chun-xiang Zhang, Lei Wang. Clustering method based on data division and partition. Journal of Central South University, 2014, 21(1): 213-222 DOI:10.1007/s11771-014-1932-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

MacqueenJ B. Some methods for classification and analysis of multivariate observations [C]. The 5th Berkeley Symposium on Mathematical Statistics and Probability. Berkeley, USA, 1967281-297

[2]

LiQ-f, PengW-feng. A new clustering algorithm for large datasets [J]. Journal of Central South University of Technology, 2011, 18(3): 823-829

[3]

FreyB J, DueckD. Clustering by passing message between data points [J]. Science, 2007, 315: 972-976

[4]

ZhangT, RamakrishnaR, LivnyM. Birch: An efficient data clustering method for large databases [C]. Proceedings of ACM-SIGMOD International Conference on Management of Data. Montreal, 1996103-114

[5]

GuhaS, RastogiR, ShimK. Cure: An efficient clustering algorithm for large databases [C]. Proceedings of 1998 ACM-SIGMOD International Conference on Management of Data, 199873-84

[6]

NgR T, HanJ. Efficient and effective clustering methods for spatial data mining [C]. Proceeding of the 20th VLDB Conference Santiago. Chile, 1994144-155

[7]

SheikholeslamiG, ChatterjeeS, ZhangA. WaveCluster: A muti-resolution clustering approach for very large spatial databases [C]. Proceedings of 24th International Conference on Very Large Database, 1998428-439

[8]

KiddleS J, WindramO P, MchattieS. Temporal clustering by affinity propagation reveals transcriptional modules in Arabidopsis thaliana [J]. Bioinformatics, 2010, 26(3): 355-362

[9]

ChangC-liang. Finding prototypes for nearest neighbor classifiers [J]. IEEE Transactions on Computers C, 1974, 23(11): 1179-1184

[10]

LIANG Jiu-zhen, SONG Wei. Clustering based on Steiner points [J]. International Journal of Machine Learning and Cybernetics, DOI: 10.1007/s13042-011-0047-7.

[11]

BradleyP, FayyadU, ReinaC. Scaling clustering algorithms to large databases [C]. Proceedings of the 4th International Conference on Knowledge Discovery & Data Mining. Redmond, USA, 19989-15

[12]

BradleyP, FayyadU, ReinaCScaling EM (Expectation-Maximization) clustering to large databases [R], 1998, Redmond, Technical Report MSR-TR-98-35, Microsoft Research: 9-15

[13]

DempsterA P, LairdN M, RubinD B. Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society: Series B, 1977, 39(1): 1-38

[14]

MézardM, ParisiG, ZecchinaR. Analytic and algorithmic solution of random satisfiability problems [J]. Science, 2002, 297: 812-815

[15]

Machine-learning-databases [EB/OL]. 2011-09-25. http://archive.ics.uci.edu/ml/machine-learning-databases/.

[16]

WittenL H, FrankE, HallM AData ming: Practical machine learning tools and techniques [M], 20113rd editionBurlington, USA, Morgan Kaufmann: 173-182

AI Summary AI Mindmap
PDF

85

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/