Density Peaks Clustering Based on Weighted Density Estimating and Multicluster Merging
Xinran Zhou , Qinghua Zhang , Chengying Wu , Qin Xie , Guoyin Wang
CAAI Transactions on Intelligence Technology ›› 2025, Vol. 10 ›› Issue (6) : 1919 -1933.
Density Peaks Clustering Based on Weighted Density Estimating and Multicluster Merging
Density peaks clustering (DPC) is a density-based clustering algorithm that identifies cluster centres by constructing decision graphs and allocates noncluster centres. Although DPC does not require specifying cluster numbers in advance, the local density is affected by the distribution of data points. Meanwhile, allocating noncluster centres is likely to result in continuous errors. Hence, a novel DPC based on weighted density estimating and multicluster merging (DPC-WDMM) is proposed. Firstly, a novel local density is defined by the nearest neighbour relationship. Then, to avoid incorrect selection of cluster centres, data points with relatively high local density are all marked within the local range. Finally, using these data points to represent microclusters for merging, the final clustering results can be obtained. The performance of this novel algorithm has been demonstrated through the experimental results on several datasets.
clustering / data analysis / feature selection
| [1] |
|
| [2] |
|
| [3] |
|
| [4] |
|
| [5] |
L. Kaufman and P. Rousseeuw, Finding Groups in Data: An Intro-duction to Cluster Analysis (John Wiley and Sons, 1990). |
| [6] |
|
| [7] |
|
| [8] |
|
| [9] |
X. Xu, S. F. Ding, and T. F. Sun, “A Fast Density Peaks Clustering Algorithm Based on Pre-Screening,” in 2018 IEEE International Con-ference on Big Data and Smart Computing (BigComp) (2018), 513-516. |
| [10] |
W. Wang, J. Yang, and R. Muntz, “STING: A Statistical Information Grid Approach to Spatial Data Mining,” in Proceedings of 23th Inter-national Conference on Very Large Data Bases, Vol. 97 (1997), 186-195. |
| [11] |
A. Rodriguez and A. Laio, “Clustering by Fast Search and Find of Density Peaks,” Science. 344, no. 6191 (2014): 1492-1496, https://doi.org/.1126/science.1242072. |
| [12] |
B. Y. Wang, J. Zhang, Y. Liu, and Y. Zou, “Density Peaks Clustering Based Integrate Framework for Multi-Document Summarization,” CAAI Transactions on Intelligence Technology 2, no. 1 (2017): 26-30, https://doi.org/.1016/j.trit.2016.12.005. |
| [13] |
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, “K-Means Clustering Algorithms: A Comprehensive Review, Variants Analysis, and Advances in the Era of Big Data,” Information Sciences 622 (2023): 178-210, https://doi.org/.1016/j.ins.2022.11.139. |
| [14] |
R. Gujanatti and V. Nataraj, “Cloud Classification Using K-Means Clustering and Content Based Image Retrieval Technique,” in 2020 International Conference on Communication and Signal Processing (ICCSP) (2020), 700-704. |
| [15] |
P. Alexis, B. Paolo, J. D. Fekete, C. Plaisant, and P. Valdivia, “Integrating Prior Knowledge in Mixed-Initiative Social Network Clus-tering,” IEEE Transactions on Visualization and Computer Graphics 27, no. 2 (2021): 1775-1785, https://doi.org/.1109/tvcg.2020.3030347. |
| [16] |
J. Wen, Z. Zhang, L. Fei, and M. Wang, “Generalized Incomplete Multiview Clustering With Flexible Locality Structure Diffusion,” IEEE Transactions on Cybernetics 51, no. 1 (2020): 101-114, https://doi.org/.1109/tcyb.2020.2987164. |
| [17] |
W. N. Tong, S. Liu, and X. Z. Gao, “A Density-Peak-Based Clus-tering Algorithm of Automatically Determining the Number of Clus-ters,” Neurocomputing 458 (2021): 655-666, https://doi.org/.1016/j.neucom.2020.03.125. |
| [18] |
Z. Y. Zhang, Q. S. Zhu, F. Zhu, et al., “Density Decay Graph-Based Density Peak Clustering,” Knowledge-Based Systems 224 (2021): 107075, https://doi.org/.1016/j.knosys.2021.107075. |
| [19] |
Y. M. Tang, Z. F. Pan, W. Pedrycz, F. Ren, and X. Song, “Viewpoint-Based Kernel Fuzzy Clustering With Weight Information Granules,” IEEE Transactions on Emerging Topics in Computational Intelligence 7, no. 2 (2023): 342-356, https://doi.org/.1109/tetci.2022.3201620. |
| [20] |
X. M. Tao, W. J. Guo, C. Ren, et al., “Density Peak Clustering Using Global and Local Consistency Adjustable Manifold Distance,” Infor-mation Sciences 577 (2021): 769-804, https://doi.org/.1016/j.ins.2021.08.036. |
| [21] |
J. Zhao, G. Wang, J. S. Pan, T. Fan, and I. Lee, “Density Peaks Clustering Algorithm Based on Fuzzy and Weighted Shared Neighbor for Uneven Density Datasets,” Pattern Recognition 139 (2023): 109406, https://doi.org/.1016/j.patcog.2023.109406. |
| [22] |
Y. Y. Niu, D. T. Kong, L. Liu, R. Wen, and J. Xiao, “Overlapping Community Detection With Adaptive Density Peaks Clustering and Iterative Partition Strategy,” Expert Systems With Applications 213 (2023): 119213, https://doi.org/.1016/j.eswa.2022.119213. |
| [23] |
M. Herrmann, D. Kazempour, F. Scheipl, and P. Kröger, “Enhancing Cluster Analysis Via Topological Manifold Learning,” Data Mining and Knowledge Discovery 38, no. 3 (2024): 840-887, https://doi.org/.1007/s10618-023-00980-2. |
| [24] |
J. Y. Xie, X. L. Liu, and M. Wang, “SFKNN-DPC: Standard Devia-tion Weighted Distance Based Density Peak Clustering Algorithm,” Information Sciences 635 (2024): 119788, https://doi.org/.1016/j.ins.2023.119788. |
| [25] |
X. N. Yuan, H. Yu, J. Liang, and B. Xu, “A Novel Density Peaks Clustering Algorithm Based on K-Nearest Neighbors With Adaptive Merging Strategy,” International Journal of Machine Learning and Cy-bernetics 12, no. 10 (2021): 2825-2841, https://doi.org/.1007/s13042-021-01369-7. |
| [26] |
Y. Li, L. Y. Sun, and Y. Tang, “DPC-DNG: Graph-Based Label Propagation of K-Nearest Higher-Density Neighbors for Density Peaks Clustering,” Applied Soft Computing 161 (2024): 111773, https://doi.org/.1016/j.asoc.2024.111773. |
| [27] |
L. Sun, X. Y. Qin, W. Ding, J. Xu, and S. Zhang, “Density Peaks Clustering Based on K-Nearest Neighbors and Self-Recommendation,” International Journal of Machine Learning and Cybernetics 12, no. 7 (2021): 913-1938, https://doi.org/.1007/s13042-021-01284-x. |
| [28] |
S. F. Ding, C. Li, X. Xu, et al., “A Sampling-Based Density Peaks Clustering Algorithm for Large-Scale Data,” Pattern Recognition 136 (2023): 109238, https://doi.org/.1016/j.patcog.2022.109238. |
| [29] |
F. Ros, R. Riad, and S. Guillaume, “PDBI: A Partitioning Davies-Bouldin Index for Clustering Evaluation,” Neurocomputing 528 (2023): 178-199, https://doi.org/.1016/j.neucom.2023.01.043. |
| [30] |
C. Li, S. F. Ding, X. Xu, and H. Hou, “Fast Density Peaks Clustering Algorithm Based on Improved Mutual K-Nearest-Neighbor and Sub-Cluster Merging,” Information Sciences 647 (2023): 119470, https://doi.org/.1016/j.ins.2023.119470. |
| [31] |
N. Roussopoulos, S. Kelley, and F. Vincent, “Nearest Neighbor Queries,” in Proceedings of the 1995 ACM SIGMOD International Con-ference on Management of Data (1995), 71-79. |
| [32] |
M. R. Brito, E. L. Chavez, A. Quiroz, and J. Yukich, “Connectivity of the Mutual K-Nearest-Neighbor Graph in Clustering and Outlier Detection,” Statistic and Probability Letters 35, no. 1 (1997): 33-42, https://doi.org/.1016/s0167-7152(96)00213-1. |
| [33] |
M. E. Houle, H. Kriegel, P. Kröger, E. Schubert, and A. Zimek, “Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?,” in Proceedings of 22nd International Conference on Scientific and Statistical Database Management (2010), 482-500. |
| [34] |
J. Hou, A. H. Zhang, and N. Qi, “Density Peak Clustering Based on Relative Density Relationship,” Pattern Recognition 108 (2020): 107554, https://doi.org/.1016/j.patcog.2020.107554. |
| [35] |
J. Y. Xie, H. C. Gao, W. Xie, X. Liu, and P. W. Grant, “Robust Clustering by Detecting Density Peaks and Assigning Points Based on Fuzzy Weighted K-Nearest Neighbors,” Information Sciences 354 (2016): 19-40, https://doi.org/.1016/j.ins.2016.03.011. |
| [36] |
Q. H. Zhang, Y. Y. Dai, and G. Wang, “Density Peaks Clustering Based on Balance Density and Connectivity,” Pattern Recognition 134 (2023): 109052, https://doi.org/.1016/j.patcog.2022.109052. |
| [37] |
N. X. Vinh, J. Epps, and J. Bailey, “Information Theoretic Measures for Clusterings Comparison: Is a Correction for Chance Necessary?,” in Proceedings of the 26th Annual International Conference on Machine Learning (2009), 1073-1080. |
| [38] |
W. Chen, Y. Q. Song, H. Bai, C. J. Lin, and E. Y. Chang, “Parallel Spectral Clustering in Distributed Systems,” IEEE Transactions on Pattern Analysis and Machine Intelligence 33, no. 3 (2010): 568-586, https://doi.org/.1109/tpami.2010.88. |
| [39] |
L. Zelnik-Manor and P. Perona, “Self-Tuning Spectral Clustering,” Advances in Neural Information Processing Systems 17 (2004). |
| [40] |
A. K. Jain and M. H. Law, “Data Clustering: A User’s Dilemma,” Lecture Notes in Computer Science 3776 (2005): 1-10, https://doi.org/.1007/11590316_1. |
| [41] |
A. M. Abbas, A. A. El-Zoghabi, and A. Shoukry, “DenMune: Density Peak Based Clustering Using Mutual Nearest Neighbors,” Pattern Recognition 109 (2021): 107589, https://doi.org/.1016/j.patcog.2020.107589. |
| [42] |
L. M. Fu and E. Medico, “FLAME. A Novel Fuzzy Clustering Method for the Analysis of DNA Microarray Data,” BMC Bioinformatics 8, no. 1 (2007): 1-15, https://doi.org/.1186/1471-2105-8-3. |
| [43] |
H. Chang and D. Y. Yeung, “Robust Path-Based Spectral Clus-tering,” Pattern Recognition 41, no. 1 (2008): 191-203, https://doi.org/.1016/j.patcog.2007.04.010. |
| [44] |
P. Franti, O. Virmajoki, and V. Hautamaki, “Fast Agglomerative Clustering Using a K-Nearest Neighbor Graph,” IEEE Transactions on Pattern Analysis and Machine Intelligence 28, no. 11 (2006): 1875-1881, https://doi.org/.1109/tpami.2006.227. |
| [45] |
C. J. Veenman, M. Reinders, and E. Backer, “A Maximum Variance Cluster Algorithm,” IEEE Transactions on Pattern Analysis and Machine Intelligence 24, no. 9 (2002): 1273-1280, https://doi.org/.1109/tpami.2002.1033218. |
| [46] |
A. Gionis, H. Mannila, and P. Tsaparas, “Clustering Aggregation,” ACM Transactions on Knowledge Discovery From Data 1, no. 1 (2007): 1-30, https://doi.org/.1145/1217299.1217303. |
| [47] |
C. T. Zahn, “Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters,” IEEE Transactions on Computers 100, no. 1 (1971): 68-86, https://doi.org/.1109/t-c.1971.223083. |
| [48] |
K. Bache and M. Lichman, UCI Machine Learning Repository (2020), http://archive.ics.uci.edu/ml. |
National Natural Science Foundation of China(Grant 62276038)
Joint Fund of Chongqing Natural Science Foundation for Innovation and Development(Grant CSTB2023NSCQ-LZX0164)
Chongqing Talent Program(Grant cstc2022ycjh-bgzxm0089)
/
| 〈 |
|
〉 |