A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection
A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection作者机构:The Graduate School of Natural and Applied SciencesDokuz Eylul UniversityIzmirTurkey Department of Computer EngineeringDokuz Eylul UniversityIzmirTurkey
出 版 物:《Journal of Data and Information Science》 (数据与情报科学学报(英文版))
年 卷 期:2020年第5卷第2期
页 面:111-135页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 0839[工学-网络空间安全] 08[工学] 081201[工学-计算机系统结构] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Outlier detection Local outlier factor Ensemble learning Bagging Voting
摘 要:Purpose:The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets *** serve this purpose,a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire ***/methodology/approach:This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection *** proposed approach,named Bagged and Voted Local Outlier Detection(BV-LOF),benefits from the Local Outlier Factor(LOF)as the base algorithm and improves its detection rate by using ensemble ***:Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF *** to the results,the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on *** limitations:In the BV-LOF approach,the base algorithm is applied to each subset data multiple times with different neighborhood sizes(k)in each case and with different ensemble sizes(T).In our study,we have chosen k and T value ranges as[1-100];however,these ranges can be changed according to the dataset handled and to the problem *** implications:The proposed method can be applied to the datasets from different domains(***,finance,manufacturing,etc.)without requiring any prior *** the BV-LOF method includes two-level ensemble operations,it may lead to more computational time than single-level ensemble methods;however,this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or ***/value:The proposed approach(BV-LOF)investigates multiple neighborhood sizes(k),which provides findings of instances with different local densities,and in this way,it provides more likelihood of outlier detection that LOF may *** also brings many benefits such as easy impleme