A modified ant-based text clustering algorithm with semantic similarity measure
Haoxiang Xia , Shuguang Wang , Taketoshi Yoshida
Journal of Systems Science and Systems Engineering ›› 2006, Vol. 15 ›› Issue (4) : 474 -492.
A modified ant-based text clustering algorithm with semantic similarity measure
Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant’s carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.
Ant-based clustering / text clustering / ant movement rule / semantic similarity measure
| [1] |
Ankerst, M., Breunig, M., Kriegel, H.P. & Sander, J. (1999). OPTICS: Ordering points to identify clustering structure. In: Proceedings of the ACM SIGMOD Conference, pp. 49–60 |
| [2] |
|
| [3] |
Beni, G. & Wang, U. (1989). Swarm intelligence in cellular robotic systems. In: Proceedings of NATO Advanced Workshop on Robots and Biological Systems. Tuscany, Italy, 1989 |
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
Chen, L., Xu, X. & Chen, Y. (2004). An adaptive ant colony clustering algorithm. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics (ICMLC04), pp.1387–1392 |
| [8] |
|
| [9] |
|
| [10] |
Handl, J., Knowles, J. & Dorigo, M. (2003). On the performance of ant-based clustering. In: Proceedings of the Third International Conference on Hybrid Intelligent Systems. pp. 204–213, IOS Press |
| [11] |
|
| [12] |
|
| [13] |
Hoe, K., Lai, W. & Tai, T. (2002). Homogeneous ants for Web document similarity modeling and categorization. In: Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature, pp. 256–261 |
| [14] |
Hotho, A., Staab, S. & Stumme, G. (2003). Wordnet improves text document clustering. In: Proceedings of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, Toronto, Canada. July 28–August 1, 2003 |
| [15] |
|
| [16] |
Jing, L., Zhou, L., Ng, M. K. & Huang, J.Z. (2006). Ontology-based distance measure for text clustering. In: Proceedings of the 4th Workshop on Text Mining, 6th SIAM International Conference on Data Mining |
| [17] |
Kanade, P. & Hall, L.O. (2003). Fuzzy ants as a clustering concept. In: Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS), pp. 227–232 |
| [18] |
|
| [19] |
Labroché, N. Monmarché, N. & Venturini, G. (2002). A new clustering algorithm based on the chemical recognition system of ants. In: Proceedings of the 2002 European Conference on Artificial Intelligence, pp. 345–349 |
| [20] |
Lewis, D. (2006). Reuters-21578 text categorization test collection. Available via: http://www.daviddlewis.com/resources/testcollections/reuters21578. Cited Nov. 10, 2006. |
| [21] |
|
| [22] |
Lin, D. (1998). An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp.296–304 |
| [23] |
|
| [24] |
|
| [25] |
Megaputer Intelligence Inc. (2006). Online introduction to TextAnalyst™. Available via: http://www.megaputer.com/products/, Cited Nov. 12, 2006 |
| [26] |
|
| [27] |
Monmarché, N. (1999). On data clustering with artificial ants. In: AAAI-99 and GECCO-99 Workshop on Data Mining with Evolutionary Algorithms: Research Directions, pp.23–26 |
| [28] |
|
| [29] |
|
| [30] |
Ramos, V. & Merelo, J. (2002). Self-organized stigmergic document maps: environments as a mechanism for context learning. In: Proceedings of the First Spanish Conference on Evolutionary and Bio-Inspired Algorithms. pp. 284–293 |
| [31] |
Rector, A., Gangemi, A., Galeazzi, E., Glowinski, A. & Rossi-Mori A. (1994). The GALEN CORE model schemata for anatomy: towards a reusable application-independent model of medical concepts. In: Proceedings of Medical Informatics Europe (MIE94) |
| [32] |
Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence |
| [33] |
|
| [34] |
|
| [35] |
|
| [36] |
|
| [37] |
|
| [38] |
|
/
| 〈 |
|
〉 |