Analysis of CLARANS Algorithm for Weather Data Based on Spark
作者机构:College of Artificial IntelligenceNanjing University of Information Science and TechnologyNanjing210044China
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2023年第76卷第8期
页 面:2427-2441页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 081104[工学-模式识别与智能系统] 08[工学] 0835[工学-软件工程] 0811[工学-控制科学与工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by the National Natural Science Foundation of China(Grant No.62101275 and 62101274)
主 题:Clustering analysis cloud computing platform parallel algorithm
摘 要:With the rapid development of technology,processing the explosive growth of meteorological data on traditional standalone computing has become increasingly time-consuming,which cannot meet the demands of scientific research and ***,this paper proposes the implementation of the parallel Clustering Large Application based upon RANdomized Search(CLARANS)clustering algorithm on the Spark cloud computing platformto cluster China’s climate regions usingmeteorological data from1988 to *** aim is to address the challenge of applying clustering algorithms to large *** this paper,the morphological similarity distance is adopted as the similarity measurement standard instead of Euclidean distance,which improves clustering ***,the issue of local optima caused by an improper selection of initial clustering centers is addressed by utilizing the max-distance *** to the k-means clustering algorithm already implemented in the Spark platform,the proposed algorithm has strong robustness,can reduce the interference of outliers in the dataset on clustering results,and has higher parallel performance than the frequently used serial algorithms,thus improving the efficiency of big data *** experiment compares the clustered centroid data with the annual average meteorological data of representative cities in the five typical meteorological regions that exist in China,and the results show that the clustering results are in good agreement with the meteorological data obtained from the National Meteorological Science Data *** algorithm has a positive effect on the clustering analysis of massive meteorological data and deserves attention in scientific research activities.