咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >GA-iForest: An Efficient Isola... 收藏

GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection

GA-iForest: An Efficient Isolated Forest Framework Based on Genetic Algorithm for Numerical Data Outlier Detection

作     者:LI Kexin LI Jing LIU Shuji LI Zhao BO Jue LIU Biqi 

作者机构:College of Computer Science and TechnologyNanjing University of Aeronautics and AstronauticsNanjing 211106P.R.China State Grid Liaoning Electric Power Supply Co.LTDShenyang 110004P.R.China 

出 版 物:《Transactions of Nanjing University of Aeronautics and Astronautics》 (南京航空航天大学学报(英文版))

年 卷 期:2019年第36卷第6期

页      面:1026-1038页

核心收录:

学科分类:08[工学] 0802[工学-机械工程] 0825[工学-航空宇航科学与技术] 0704[理学-天文学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 081202[工学-计算机软件与理论] 

基  金:supported by the State Grid Liaoning Electric Power Supply CO  LTDthe financial support for the “Key Technology and Application Research of the Self-Service Grid Big Data Governance (No.SGLNXT00YJJS1800110)” 

主  题:outlier detection isolation tree isolated forest genetic algorithm feature selection 

摘      要:With the development of data age,data quality has become one of the problems that people pay much attention to.As a field of data mining,outlier detection is related to the quality of data.The isolated forest algorithm is one of the more prominent numerical data outlier detection algorithms in recent years.In the process of constructing the isolation tree by the isolated forest algorithm,as the isolation tree is continuously generated,the difference of isolation trees will gradually decrease or even no difference,which will result in the waste of memory and reduced efficiency of outlier detection.And in the constructed isolation trees,some isolation trees cannot detect outlier.In this paper,an improved iForest-based method GA-iForest is proposed.This method optimizes the isolated forest by selecting some better isolation trees according to the detection accuracy and the difference of isolation trees,thereby reducing some duplicate,similar and poor detection isolation trees and improving the accuracy and stability of outlier detection.In the experiment,Ubuntu system and Spark platform are used to build the experiment environment.The outlier datasets provided by ODDS are used as test.According to indicators such as the accuracy,recall rate,ROC curves,AUC and execution time,the performance of the proposed method is evaluated.Experimental results show that the proposed method can not only improve the accuracy and stability of outlier detection,but also reduce the number of isolation trees by 20%-40%compared with the original iForest method.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分