Document clustering based on constructing density tree?

Weidi Dai , Wenjun Wang , Yuexian Hou , Ying Wang , Lu Zhang

Transactions of Tianjin University ›› 2008, Vol. 14 ›› Issue (1) : 21 -26.

PDF
Transactions of Tianjin University ›› 2008, Vol. 14 ›› Issue (1) : 21 -26. DOI: 10.1007/s12209-008-0005-y
Article

Document clustering based on constructing density tree?

Author information +
History +
PDF

Abstract

This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN)’s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node’s radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.

Keywords

document handling / clustering / tree structure / vector space model

Cite this article

Download citation ▾
Weidi Dai, Wenjun Wang, Yuexian Hou, Ying Wang, Lu Zhang. Document clustering based on constructing density tree?. Transactions of Tianjin University, 2008, 14(1): 21-26 DOI:10.1007/s12209-008-0005-y

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Han J. W., Kambr M.. Data Mining Concepts and Techniques[M]. 2001, Beijing: Higher Education Press.

[2]

Joo K. H., Lee S.. An incremental document clustering algorithm based on a hierarchical agglomerative approach[C] Distributed Computing and Internet Technology (ICDCIT 2005), 2005, Heidelberg: Springer 321-332.

[3]

Antonio S. G., Amine B., Nadia T.. Data mining for text categorization with semi-supervised agglomerative hierarchical clustering[J]. International Journal of Intelligent Systems, 2000, 15(7): 633-646.

[4]

Zhao Y., George K.. Hierarchical clustering algorithms for document datasets[J]. Data Mining and Knowledge Discovery, 2005, 10(2): 141-168.

[5]

Kaufman L., Rousseeuw P. J.. Finding Groups in Data: An Introduction to Cluster Analysis[M]. 1990, New York: John Wiley & Sons.

[6]

Jing L., Michael K. N., Xu J., et al. Subspace clustering of text documents with feature weighting KT-means algorithm[C] Advances in Knowledge Discovery and Data Mining(PAKDD 2005), 2005, Heidelberg: Springer 802-812.

[7]

Martin E., Hans-Peter K., Jörg S., et al. A density-based algorithm for discovering clusters in large spatial databases with noise [C] Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, Menlo Park, California: Association for the Advancement of Artificial Intelligence 226-231.

[8]

Su Z., Ma S., Yang Q., et al. Document clustering based on web-log mining[J]. Journal of Software, 2002, 13(1): 99-104.

[9]

Li Y., Chung Soon M.. Text document clustering based on frequent word sequences[C] Proceedings of the 14th ACM International Conference on Information and Knowledge Management, 2005, New York: ACM Press 293-294.

[10]

Luo N., Zuo W., Yuan F., et al. Using ontology semantics to improve text documents clustering[J]. Journal of Southeast University (English Edition), 2006, 22(3): 370-374.

[11]

Ying Zhao, George Karypis. Criterion Functions for Document Clustering: Experiment and Analysis [EB/OL]. http://glaros.dtc.umn.edu/gkhome/publications/dm?page=3, 2007-05-30.

[12]

Reuters-21578[EB/OL]. http://www.daviddlewis.com/resources/testcollections/reuters21578/, 2007-05-30.

[13]

Mobasher B, Dai H H, Luo T et al. Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data[EB/OL]. http://facWeb.cs.depaul.edu/research/TechReports/, 2001-03-01.

AI Summary AI Mindmap
PDF

132

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/