Chinese and Vietnamese bilingual news topic discovery via association graph clustering
Xiao-Cong WANG , Pei-Li TANG , Yu-Xin HUANG , Sheng-Xiang GAO , Zheng-Tao YU
Front. Comput. Sci. ›› 2026, Vol. 20 ›› Issue (9) : 2009340
Chinese and Vietnamese bilingual news topic discovery via association graph clustering
Cross-linguistic news topic discovery is the automatic classification of online news articles in different languages reporting on the same event. Its difficulty is in multilingual text clustering. There are many types of associations between news events, which can interact with each other and propagate throughout the event network. Using this idea, a method for Chinese-Vietnamese bilingual news topic discovery based on association graph clustering is proposed in this paper. First, a Chinese-Vietnamese bilingual association graph is constructed based on associations among elements of the articles. The Chinese and Vietnamese texts are clustered roughly using the affinity propagate (AP) algorithm, then the clustering results are adjusted, making use of the association sizes to update the weights dynamically. News in both languages is used to supervise the clustering, weakening the effect of linguistic differences on the results. Finally, optimal local and global Chinese-Vietnamese bilingual news clusters are obtained, realizing automated clustering. We use 2000 news texts obtained from 15 authoritative Chinese websites and 10 Vietnamese websites as experimental data. The experimental results show that the F value of the proposed method is improved by 8.4% compared with K-means clustering.
cross-linguistic / topic discovery / graph clustering / association propagation / co-clustering
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
|
| [6] |
|
| [7] |
Boyd-Graber J, Blei D M. Multilingual topic models for unaligned text. In: Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. 2009, 75−82 |
| [8] |
|
| [9] |
|
| [10] |
|
| [11] |
|
| [12] |
|
| [13] |
|
| [14] |
|
| [15] |
|
| [16] |
|
| [17] |
|
| [18] |
|
| [19] |
|
| [20] |
|
| [21] |
|
| [22] |
|
| [23] |
|
| [24] |
Dhillon I S, Mallela S, Modha D S. Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 89−98 |
| [25] |
|
| [26] |
Lang K. NewsWeeder: learning to filter netnews. In: Proceedings of the 12th International Conference on International Conference on Machine Learning. 1995, 331−339 |
| [27] |
|
| [28] |
|
Higher Education Press
/
| 〈 |
|
〉 |