A modified ant-based text clustering algorithm with semantic similarity measure

Haoxiang Xia , Shuguang Wang , Taketoshi Yoshida

Journal of Systems Science and Systems Engineering ›› 2006, Vol. 15 ›› Issue (4) : 474 -492.

PDF
Journal of Systems Science and Systems Engineering ›› 2006, Vol. 15 ›› Issue (4) : 474 -492. DOI: 10.1007/s11518-006-5029-z
Article

A modified ant-based text clustering algorithm with semantic similarity measure

Author information +
History +
PDF

Abstract

Ant-based text clustering is a promising technique that has attracted great research attention. This paper attempts to improve the standard ant-based text-clustering algorithm in two dimensions. On one hand, the ontology-based semantic similarity measure is used in conjunction with the traditional vector-space-model-based measure to provide more accurate assessment of the similarity between documents. On the other, the ant behavior model is modified to pursue better algorithmic performance. Especially, the ant movement rule is adjusted so as to direct a laden ant toward a dense area of the same type of items as the ant’s carrying item, and to direct an unladen ant toward an area that contains an item dissimilar with the surrounding items within its Moore neighborhood. Using WordNet as the base ontology for assessing the semantic similarity between documents, the proposed algorithm is tested with a sample set of documents excerpted from the Reuters-21578 corpus and the experiment results partly indicate that the proposed algorithm perform better than the standard ant-based text-clustering algorithm and the k-means algorithm.

Keywords

Ant-based clustering / text clustering / ant movement rule / semantic similarity measure

Cite this article

Download citation ▾
Haoxiang Xia, Shuguang Wang, Taketoshi Yoshida. A modified ant-based text clustering algorithm with semantic similarity measure. Journal of Systems Science and Systems Engineering, 2006, 15(4): 474-492 DOI:10.1007/s11518-006-5029-z

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Ankerst, M., Breunig, M., Kriegel, H.P. & Sander, J. (1999). OPTICS: Ordering points to identify clustering structure. In: Proceedings of the ACM SIGMOD Conference, pp. 49–60

[2]

Baeza-Yates R., Ribeiro-Neto B.. Modern Information Retrieval, 1999, Boston, MA, USA: Addison-Wesley

[3]

Beni, G. & Wang, U. (1989). Swarm intelligence in cellular robotic systems. In: Proceedings of NATO Advanced Workshop on Robots and Biological Systems. Tuscany, Italy, 1989

[4]

Berry M.. Survey of Text Mining: Clustering, Classification, and Retrieval, 2003, New York: Springer

[5]

Beyer K., Goldstein J., Ramakrishnan R., Shaft U.. When is “nearest neighbor” meaningful?. Proceedings of 7th International Conference of Database Theory (ICDT99), LNCS 1540, 1999, Berlin: Springer 217-235.

[6]

Bonabeau E., Dorigo M., Theraulaz G.. Swarm Intelligence: From Natural to Artificial Systems, 1999, New York: Oxford University Press.

[7]

Chen, L., Xu, X. & Chen, Y. (2004). An adaptive ant colony clustering algorithm. In: Proceedings of the Third International Conference on Machine Learning and Cybernetics (ICMLC04), pp.1387–1392

[8]

Chiou Y.-C., Lan L.W.. Genetic clustering algorithms. European Journal of Operational Research, 2000, 135: 413-427.

[9]

Deneubourg J.L., Goss S., Franks N., Sendova-Franks A., Detrain C., Chétien L.. The dynamics of collective sorting: robot-like ants and ant-like robots. Proceedings of the 1st International Conference on Simulation of Adaptive Behaviour, 1991, Cambridge, MA: MIT Press 356-363.

[10]

Handl, J., Knowles, J. & Dorigo, M. (2003). On the performance of ant-based clustering. In: Proceedings of the Third International Conference on Hybrid Intelligent Systems. pp. 204–213, IOS Press

[11]

Handl J., Meyer B.. Improved ant-based clustering and sorting in a document retrieval interface. Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature (PPSN VII), 2002, Berlin: Springer-Verlag 913-923.

[12]

Hartigan J., Wong M.. Algorithm AS136: A k-means clustering algorithm. Applied Statistics, 1979, 28: 100-108.

[13]

Hoe, K., Lai, W. & Tai, T. (2002). Homogeneous ants for Web document similarity modeling and categorization. In: Proceedings of the Seventh International Conference on Parallel Problem Solving from Nature, pp. 256–261

[14]

Hotho, A., Staab, S. & Stumme, G. (2003). Wordnet improves text document clustering. In: Proceedings of the Semantic Web Workshop at SIGIR-2003, 26th Annual International ACM SIGIR Conference, Toronto, Canada. July 28–August 1, 2003

[15]

Jain A.K., Murty M.N., Flynn P.J.. Data clustering: a review. ACM Computing Surveys, 1999, 31(3): 264-323.

[16]

Jing, L., Zhou, L., Ng, M. K. & Huang, J.Z. (2006). Ontology-based distance measure for text clustering. In: Proceedings of the 4th Workshop on Text Mining, 6th SIAM International Conference on Data Mining

[17]

Kanade, P. & Hall, L.O. (2003). Fuzzy ants as a clustering concept. In: Proceedings of the 22nd International Conference of the North American Fuzzy Information Processing Society (NAFIPS), pp. 227–232

[18]

Kuntz P., Snyers D., Layzell P.. A stochastic heuristic for visualising graph clusters in a bi-dimensional space prior to partitioning. Journal of Heuristics, 1998, 5: 327-351.

[19]

Labroché, N. Monmarché, N. & Venturini, G. (2002). A new clustering algorithm based on the chemical recognition system of ants. In: Proceedings of the 2002 European Conference on Artificial Intelligence, pp. 345–349

[20]

Lewis, D. (2006). Reuters-21578 text categorization test collection. Available via: http://www.daviddlewis.com/resources/testcollections/reuters21578. Cited Nov. 10, 2006.

[21]

Li Y., Bandar Z., McLean D.. An approach for measuring semantic similarity between words using multiple information sources. IEEE Transactions on Knowledge and Data Engineering, 2003, 13: 871-882.

[22]

Lin, D. (1998). An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp.296–304

[23]

Lindberg D. A., Siegel E. R., Rapp B. A., Wallingford K. T., Wilson S. R.. Use of MEDLINE by physicians for clinical problem solving. Journal of the American Medical Association, 1993, 269: 3124-3129.

[24]

Lumer E., Faieta B.. Diversity and adaption in populations of clustering ants. Proceedings of the Third International Conference on Simulation of Adaptive Behaviour, 1994, Cambridge, MA: MIT Press

[25]

Megaputer Intelligence Inc. (2006). Online introduction to TextAnalyst™. Available via: http://www.megaputer.com/products/, Cited Nov. 12, 2006

[26]

Miller G.. Wordnet: a lexical database for English. Communications of the ACM, 1995, 38: 39-41.

[27]

Monmarché, N. (1999). On data clustering with artificial ants. In: AAAI-99 and GECCO-99 Workshop on Data Mining with Evolutionary Algorithms: Research Directions, pp.23–26

[28]

Rada R., Mili H., Bicknell E., Bletiner M.. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 1989, 19: 17-30.

[29]

Ramos V., Abraham A.. ANTIDS: self organized ant based clustering model for intrusion detection system. Proceedings of The Fourth IEEE International Workshop on Soft Computing as Transdisciplinary Science and Technology (WSTST’05), 2005, Berlin: Springer-Verlag 977-986.

[30]

Ramos, V. & Merelo, J. (2002). Self-organized stigmergic document maps: environments as a mechanism for context learning. In: Proceedings of the First Spanish Conference on Evolutionary and Bio-Inspired Algorithms. pp. 284–293

[31]

Rector, A., Gangemi, A., Galeazzi, E., Glowinski, A. & Rossi-Mori A. (1994). The GALEN CORE model schemata for anatomy: towards a reusable application-independent model of medical concepts. In: Proceedings of Medical Informatics Europe (MIE94)

[32]

Resnik, P. (1995). Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of 14th International Joint Conference on Artificial Intelligence

[33]

Salton G., Fox E., Wu H.. Extended boolean information retrieval. Communications of the ACM, 1983, 26: 1022-1036.

[34]

Salton G., Wong A., Yang C. S.. A vector space model for automatic indexing. Communications of the ACM, 1975, 18(11): 613-620.

[35]

Sebastiani F.. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34: 1-47.

[36]

Vizine A.L., de Castro L.N., Hruschkal E.R., Gudwin R.R.. Towards improving clustering ants: an adaptive ant clustering algorithm. Informatica, 2005, 29: 143-154.

[37]

Ward J.H.. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 1963, 58: 236-244.

[38]

Xia H., Wang S., Yoshida T.. Toward a revised ant-based text clustering algorithm. Proceedings of 7th International Symposium on Knowledge and Systems Sciences, 2006, Hong Kong: Global-Link Publisher 159-166.

AI Summary AI Mindmap
PDF

125

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/