A new clustering algorithm for large datasets

Qing-feng Li , Wen-feng Peng

Journal of Central South University ›› 2011, Vol. 18 ›› Issue (3) : 823 -829.

PDF
Journal of Central South University ›› 2011, Vol. 18 ›› Issue (3) : 823 -829. DOI: 10.1007/s11771-011-0768-5
Article

A new clustering algorithm for large datasets

Author information +
History +
PDF

Abstract

The Circle algorithm was proposed for large datasets. The idea of the algorithm is to find a set of vertices that are close to each other and far from other vertices. This algorithm makes use of the connection between clustering aggregation and the problem of correlation clustering. The best deterministic approximation algorithm was provided for the variation of the correlation of clustering problem, and showed how sampling can be used to scale the algorithms for large datasets. An extensive empirical evaluation was given for the usefulness of the problem and the solutions. The results show that this method achieves more than 50% reduction in the running time without sacrificing the quality of the clustering.

Keywords

data mining / Circle algorithm / clustering categorical data / clustering aggregation

Cite this article

Download citation ▾
Qing-feng Li, Wen-feng Peng. A new clustering algorithm for large datasets. Journal of Central South University, 2011, 18(3): 823-829 DOI:10.1007/s11771-011-0768-5

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

AilonN., CharikarM., NewmanA.. Aggregating inconsistent information: Ranking and clustering [C]. Proceedings of the ACM Symposium on Theory of Computing, 2009, Arlington, IEEE: 684-693

[2]

AndritsosP., TsaparasP., MillerR. J., SevcikK. C. Limbo.. Scalable clustering of categorical data [C]. Proceedings of the International Conference on Extending Database Technology, 2007, Vancourer, IEEE: 123-146

[3]

BansalN., BlumA., ChawlaS.Correlation clustering in Machine Learn [M], 2008, Cambridge, Kluwer Academic Publisher: 3-11

[4]

BarthelemyJ., LeclercB.. The median procedure for partitions [J]. DIMACS Series in Discrete Mathematics, 2008, 41(10): 23-32

[5]

BoulisC., OstendorfM.. Combining multiple clustering systems [C]. Proceedings of the European Conference on Principles and Practice of Knowledge Discovery in Databases, 2005, Selangor, IEEE: 63-74

[6]

CharikarM., GuruswamiV., WirthA.. Clustering with qualitative information [C]. Proceedings of the IEEE Symposium on Foundations of Computer Science, 2008, Seattle, IEEE Computer Society: 524-533

[7]

CristoforD., SimoviciD. A.. An information-theoretical approach to genetic algorithms for clustering [J]. IEEE Communication Magazine, 2009, 23(6): 876-881

[8]

HanJ. W., KamberM., FanM., MengX. F.Data mining concepts and techniques [M], 2001, Beijing, China Machine Press: 232-236

[9]

DemaineE. D., EmanuelD., FiatA., ImmorlicaN.. Correlation clustering in general weighted graphs [J]. Theoretical Computer Science, 2006, 36(1): 172-187

[10]

DworkC., KumarR., NaorM., SivakumarD.. Rank aggregation methods for the Web [C]. Proceedings of the International World Wide Web Conference, 2003, Athens, IEEE: 613-622

[11]

FaginR., KumarR., SivakumarD.. Comparing top kind lists [C]. Proceedings of the ACMSIAM Symposium on Discrete Algorithms, 2008, Alaska, IEEE: 28-36

[12]

FernX. Z., BrodleyC. E.. Random projection for high dimensional data clustering: A cluster ensemble approach [C]. Proceedings of the International Conference on Machine Learning, 2003, Texas, IEEE: 186-193

[13]

GuanQ.-y., ZhaoH.-l., GuoQing.. Cancellation for frequency offset in OFDM system based on TF-LMS algorithm [J]. Journal of Central South University, 2010, 17(6): 1293-1299

[14]

BaiS.. Concept clustering under insufficient knowledge [J]. Chinese Journal of Computers, 1995, 18(6): 409-416

[15]

GuoJ. S., ZhaoY., ShiP. F.. An efficient dynamic conceptual clustering algorithm for data mining [J]. Journal of Software, 2001, 12(4): 582-591

AI Summary AI Mindmap
PDF

142

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/