Clustering-Aided Supervised Malware Detection with Specialized Classifiers and Early Consensus
作者机构:Information Security EngineeringGraduate School of Natural and Applied SciencesGazi UniversityAnkara06560Turkey
出 版 物:《Computers, Materials & Continua》 (计算机、材料和连续体(英文))
年 卷 期:2023年第75卷第4期
页 面:1235-1251页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Malware detection ensemble learning classification clustering specialized classifier early consensus
摘 要:One of the most common types of threats to the digital world is malicious *** is of great importance to detect and prevent existing and new malware before it damages information *** learning approaches are used effectively for this *** this study,we present a model in which supervised and unsupervised learning algorithms are used *** is used to enhance the prediction performance of the supervised *** aim of the proposed model is to make predictions in the shortest possible time with high accuracy and f1 *** the first stage of the model,the data are clustered with the k-means *** the second stage,the prediction is made with the combination of the classifier with the best prediction performance for the related *** choosing the best classifiers for the given clusters,triple combinations of ten machine learning algorithms(kernel support vector machine,k-nearest neighbor,naive Bayes,decision tree,random forest,extra gradient boosting,categorical boosting,adaptive boosting,extra trees,and gradient boosting)are *** selected triple classifier combination is positioned in two *** prediction time of the model is improved by positioning the classifier with the slowest prediction time in the second *** selected triple classifier combination is positioned in two *** prediction time of the model is improved by positioning the classifier with the highest prediction time in the second *** is seen that clustering before classification improves prediction performance,which is presented using Blue Hexagon Open Dataset for Malware Analysis(BODMAS),Elastic Malware Benchmark for Empowering Researchers(EMBER)2018 and Kaggle malware detection *** model has 99.74%accuracy and 99.77%f1 score for the BODMAS dataset,99.04%accuracy and 98.63%f1 score for the Kaggle malware detection dataset,and 96.77%accuracy and 96.77%f1 score for the EMBER 2018 *** addition,the tiered posi