A New Privacy-Preserving Data Publishing Algorithm Utilizing Connectivity-Based Outlier Factor and Mondrian Techniques
作者机构:Department of Computer EngineeringTurkish Air Force AcademyNational Defence UniversityYesilyurtIstanbulTurkey Department of Computer EngineeringAtatürk Strategic Studies and Graduate InstituteNational Defence UniversityBesiktasIstanbulTurkey
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2023年第76卷第8期
页 面:1515-1535页
核心收录:
学科分类:0839[工学-网络空间安全] 08[工学] 081201[工学-计算机系统结构] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by the Scientific and Technological Research Council of Turkiye under Project No.(122E670)
主 题:Data anonymization privacy-preserving data publishing k-anonymity generalization mondrian
摘 要:Developing a privacy-preserving data publishing algorithm that stops individuals from disclosing their identities while not ignoring data utility remains an important goal to *** finding the trade-off between data privacy and data utility is an NP-hard problem and also a current research *** existing approaches are investigated,one of the most significant difficulties discovered is the presence of outlier data in the *** data has a negative impact on data ***,k-anonymity algorithms,which are commonly used in the literature,do not provide adequate protection against outlier *** this study,a new data anonymization algorithm is devised and tested for boosting data utility by incorporating an outlier data detection mechanism into the Mondrian *** connectivity-based outlier factor(COF)algorithm is used to detect *** is selected because of its capacity to anonymize multidimensional data while meeting the needs of real-world ***,on the other hand,is used to discover outliers in high-dimensional datasets with complicated *** proposed algorithm generates more equivalence classes than the Mondrian algorithm and provides greater data utility than previous algorithms based on *** addition,it outperforms other algorithms in the discernibility metric(DM),normalized average equivalence class size(Cavg),global certainty penalty(GCP),query error rate,classification accuracy(CA),and F-measure ***,the increase in the values of theGCPand error ratemetrics demonstrates that the proposed algorithm facilitates obtaining higher data utility by grouping closer data points when compared to other algorithms.