Chinese and Vietnamese bilingual news topic discovery via association graph clustering

Xiao-Cong WANG , Pei-Li TANG , Yu-Xin HUANG , Sheng-Xiang GAO , Zheng-Tao YU

Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (9) : 2009340

PDF (4774KB)
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (9) : 2009340 DOI: 10.1007/s11704-025-41049-1
Artificial Intelligence
RESEARCH ARTICLE

Chinese and Vietnamese bilingual news topic discovery via association graph clustering

Author information +
History +
PDF (4774KB)

Abstract

Cross-linguistic news topic discovery is the automatic classification of online news articles in different languages reporting on the same event. Its difficulty is in multilingual text clustering. There are many types of associations between news events, which can interact with each other and propagate throughout the event network. Using this idea, a method for Chinese-Vietnamese bilingual news topic discovery based on association graph clustering is proposed in this paper. First, a Chinese-Vietnamese bilingual association graph is constructed based on associations among elements of the articles. The Chinese and Vietnamese texts are clustered roughly using the affinity propagate (AP) algorithm, then the clustering results are adjusted, making use of the association sizes to update the weights dynamically. News in both languages is used to supervise the clustering, weakening the effect of linguistic differences on the results. Finally, optimal local and global Chinese-Vietnamese bilingual news clusters are obtained, realizing automated clustering. We use 2000 news texts obtained from 15 authoritative Chinese websites and 10 Vietnamese websites as experimental data. The experimental results show that the F value of the proposed method is improved by 8.4% compared with K-means clustering.

Graphical abstract

Keywords

cross-linguistic / topic discovery / graph clustering / association propagation / co-clustering

Cite this article

Download citation ▾
Xiao-Cong WANG, Pei-Li TANG, Yu-Xin HUANG, Sheng-Xiang GAO, Zheng-Tao YU. Chinese and Vietnamese bilingual news topic discovery via association graph clustering. Front. Comput. Sci., 2026, 20(9): 2009340 DOI:10.1007/s11704-025-41049-1

登录浏览全文

4963

注册一个新账户 忘记密码

References

[1]

Larkey L S, Feng F, Connell M, Lavrenko V. Language-specific models in multilingual topic tracking. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2004, 402−409

[2]

Leek T, Jin H, Sista S, Schwartz R. The BBN crosslingual topic detection and tracking system. In: Proceedings of the Working Notes of the 3rd Topic Detection and Tracking Workshop. 2000

[3]

Pouliquen B, Steinberger R, Ignat C, Käsper E, Temnikova I. Multilingual and cross-lingual news topic tracking. In: Proceedings of the 20th International Conference on Computational Linguistics. 2004, 959−965

[4]

Mathieu B, Besançon R, Fluhr C. Multilingual document clusters discovery. In: Proceedings of the Coupling Approaches, Coupling Media and Coupling Languages for Information Retrieval. 2004, 116−125

[5]

Miller G A . WordNet: a lexical database for English. Communications of the ACM, 1995, 38( 11): 39–41

[6]

Kim Y M, Amini M R, Goutte C, Gallinari P. Multi-view clustering of multilingual documents. In: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2010, 821−822

[7]

Boyd-Graber J, Blei D M. Multilingual topic models for unaligned text. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 2009, 75−82

[8]

Wei C P, Yang C C, Lin C M . A latent semantic indexing-based approach to multilingual document clustering. Decision Support Systems, 2008, 45( 3): 606–620

[9]

Hong X, Yu Z, Tang M, Xian Y . Cross-lingual event-centered news clustering based on elements semantic correlations of different news. Multimedia Tools and Applications, 2017, 76( 23): 25129–25143

[10]

Qing X, Xin Y, Yu Z, Wang J, Gao S, Hong X . Analysis of sino-vietnamese bilingual news topics mixing elements and themes. Computer Engineering, 2016, 42( 9): 186–191

[11]

Hidayatullah A F, Ma’arif M R, Habibie M, Khomsah S . Indonesia infrastructure development topic discovery on online news with latent dirichlet allocation. IOP Conference Series: Materials Science and Engineering, 2021, 1077: 012012

[12]

Musso M, Arnold K, Nanni F, Cannelli B . What is in a <unittitle>? cross-lingual topic detection & information retrieval in archives portal Europe. ACM Journal on Computing and Cultural Heritage, 2024, 17( 2): 25

[13]

Wu X, Dong X, Nguyen T, Liu C, Pan L M, Luu A T. InfoCTM: a mutual information maximization perspective of cross-lingual topic modeling. In: Proceedings of the 37th AAAI Conference on Artificial Intelligence. 2023, 13763−13771

[14]

Dufková A. Cross lingual news article classification and automatic topic discovery using multilingual language models. Vysoké učení technické v Brně, Dissertation, 2023

[15]

Wang H, Prakash N, Hoang N K, Hee M S, Naseem U, Lee R K W. Prompting large language models for topic modeling. In: Proceedings of 2023 IEEE International Conference on Big Data. 2023, 1236−1241

[16]

Wang Y, Ji Y, Sun B, Ren Q D E J, Wu N, Liu N, Lu M, Zhao C, Jia Y. Mongolian-Chinese cross-lingual topic detection based on knowledge distillation. In: Proceedings of 2024 International Conference on Asian Language Processing. 2024, 383−388

[17]

Makkonen J, Ahonen-Myka H, Salmenkivi M. Topic detection and tracking with spatio-temporal evidence. In: Proceedings of the 25th European Conference on IR Research on Advances in Information Retrieval. 2003, 251−265

[18]

Li Y, Ruan T, Gu C . Online new event detection based on news elements. Computer Applications and Software, 2013, 30( 12): 100–104,176

[19]

Jiang Q Y, Li W J . Discrete latent factor model for cross-modal hashing. IEEE Transactions on Image Processing, 2019, 28( 7): 3490–3501

[20]

Moussiades L, Vakali A . Clustering dense graphs: a web site graph paradigm. Information Processing & Management, 2010, 46( 3): 247–267

[21]

Long B, Zhang Z, Yu P S . A general framework for relation graph clustering. Knowledge and Information Systems, 2010, 24( 3): 393–413

[22]

Han Q, Zhao H, Pan H, Yin G, Chang J . Research on spatio-temporal object graph clustering algorithm based on structure and attribute. Journal of Computer Research and Development, 2013, 50( S1): 154–162

[23]

Liu L, Sun L, Rui Y, Shi Y, Yang S. Web video topic discovery and tracking via bipartite graph reinforcement model. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 1009−1018

[24]

Dhillon I S, Mallela S, Modha D S. Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 89−98

[25]

Jain A K . Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 2010, 31( 8): 651–666

[26]

Lang K. NewsWeeder: learning to filter netnews. In: Proceedings of the 12th International Conference on International Conference on Machine Learning. 1995, 331−339

[27]

Asghar N. Yelp dataset challenge: review rating prediction. 2016, arXiv preprint arXiv: 1605.05362

[28]

Qiang J, Qian Z, Li Y, Yuan Y, Wu X . Short text topic modeling techniques, applications, and performance: a survey. IEEE Transactions on Knowledge and Data Engineering, 2022, 34( 3): 1427–1445

RIGHTS & PERMISSIONS

Higher Education Press

AI Summary AI Mindmap
PDF (4774KB)

Supplementary files

Highlights

264

Accesses

0

Citation

Detail

Sections
Recommended

AI思维导图

/