Document clustering based on constructing density tree?
Weidi Dai , Wenjun Wang , Yuexian Hou , Ying Wang , Lu Zhang
Transactions of Tianjin University ›› 2008, Vol. 14 ›› Issue (1) : 21 -26.
Document clustering based on constructing density tree?
This paper focuses on document clustering by clustering algorithm based on a DEnsityTree (CABDET) to improve the accuracy of clustering. The CABDET method constructs a density-based treestructure for every potential cluster by dynamically adjusting the radius of neighborhood according to local density. It avoids density-based spatial clustering of applications with noise (DBSCAN)’s global density parameters and reduces input parameters to one. The results of experiment on real document show that CABDET achieves better accuracy of clustering than DBSCAN method. The CABDET algorithm obtains the max F-measure value 0.347 with the root node’s radius of neighborhood 0.80, which is higher than 0.332 of DBSCAN with the radius of neighborhood 0.65 and the minimum number of objects 6.
document handling / clustering / tree structure / vector space model
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
Ying Zhao, George Karypis. Criterion Functions for Document Clustering: Experiment and Analysis [EB/OL]. http://glaros.dtc.umn.edu/gkhome/publications/dm?page=3, 2007-05-30. |
| [12] |
Reuters-21578[EB/OL]. http://www.daviddlewis.com/resources/testcollections/reuters21578/, 2007-05-30. |
| [13] |
Mobasher B, Dai H H, Luo T et al. Improving the Effectiveness of Collaborative Filtering on Anonymous Web Usage Data[EB/OL]. http://facWeb.cs.depaul.edu/research/TechReports/, 2001-03-01. |
/
| 〈 |
|
〉 |