咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Application of Algorithm CARDB... 收藏

Application of Algorithm CARDBK in Document Clustering

Application of Algorithm CARDBK in Document Clustering

作     者:ZHU Yehang ZHANG Mingjie SHI Feng 

作者机构:College of Economics and Management Xi'an University of Posts and Telecommunications Xi'an 710121 Shaanxi China Information Business Department Puyang Technician College Puyang 457000 Henan China 

出 版 物:《Wuhan University Journal of Natural Sciences》 (武汉大学学报(自然科学英文版))

年 卷 期:2018年第23卷第6期

页      面:514-524页

核心收录:

学科分类:08[工学] 081203[工学-计算机应用技术] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

基  金:Supported by the Social Science Foundation of Shaanxi Province of China(2018P03) the Humanities and Social Sciences Research Youth Fund Project of Ministry of Education of China(13YJCZH251) 

主  题:algorithm design and analysis clustering documentanalysis text processing 

摘      要:In the K-means clustering algorithm, each data point is uniquely placed into one category. The clustering quality is heavily dependent on the initial cluster centroid. Different initializations can yield varied results; local adjustment cannot save the clustering result from poor local optima. If there is an anomaly in a cluster, it will seriously affect the cluster mean value. The K-means clustering algorithm is only suitable for clusters with convex shapes. We therefore propose a novel clustering algorithm CARDBK—"centroid all rank distance(CARD)" which means that all centroids are sorted by distance value from one point and "BK" are the initials of "batch K-means"—in which one point not only modifies a cluster centroid nearest to this point but also modifies multiple clusters centroids adjacent to this point, and the degree of influence of a point on a cluster centroid depends on the distance value between this point and the other nearer cluster centroids. Experimental results showed that our CARDBK algorithm outperformed other algorithms when tested on a number of different data sets based on the following performance indexes: entropy, purity, F1 value, Rand index and normalized mutual information(NMI). Our algorithm manifested to be more stable, linearly scalable and faster.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分