Reverse-Nearest-Neighbor-Based Clustering by Fast Search and Find of Density Peaks
作者机构:College of Computer and Cyber Security Hebei Normal University Hebei Provincial Engineering Research Center for Supply Chain Big Data Analytics and Data SecurityHebei Normal University Hebei Provincial Key Laboratory of Network and Information SecurityHebei Normal University
出 版 物:《Chinese Journal of Electronics》 (电子学报(英文))
年 卷 期:2023年第32卷第6期
页 面:1341-1354页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)]
基 金:supported by the National Natural Science Foundation of China (62076088) the Technological Innovation Foundation of Hebei Normal University (L2020K09)
主 题:Measurement Manifolds Shape Clustering algorithms Synthetic data
摘 要:Clustering by fast search and find of density peaks(CFSFDP) has the advantages of a novel idea, easy implementation, and efficient clustering. It has been widely recognized in various fields since it was proposed in Science in 2014. The CFSFDP algorithm also has certain limitations, such as non-unified sample density metrics defined by cutoff distance, the domino effect for the assignment of remaining samples triggered by unstable assignment strategy, and the phenomenon of picking wrong density peaks as cluster centers. We propose reverse-nearest-neighbor-based clustering by fast search and find of density peaks(RNN-CFSFDP) to avoid these shortcomings. We redesign and unify the sample density metric by introducing reverse nearest neighbor. The newly defined local density metric and the K-nearest neighbors of each sample are combined to make the assignment process more robust and alleviate the domino effect. A cluster fusion algorithm is proposed, which further alleviates the domino effect and effectively avoids the phenomenon of picking wrong density peaks as cluster centers. Experimental results on publicly available synthetic data sets and real-world data sets show that in most cases, the proposed algorithm is superior to or at least equivalent to the comparative methods in clustering performance. The proposed algorithm works better on manifold data sets and uneven density data sets.