咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >An Online Malicious Spam Email... 收藏

An Online Malicious Spam Email Detection System Using Resource Allocating Network with Locality Sensitive Hashing

An Online Malicious Spam Email Detection System Using Resource Allocating Network with Locality Sensitive Hashing

作     者:Siti-Hajar-Aminah Ali Seiichi Ozawa Junji Nakazato Tao Ban Jumpei Shimamura 

作者机构:Graduate School of Engineering Kobe University Kobe Japan National Institute of Information and Communications Technology (NICT) Tokyo Japan Clwit Inc. Tokyo Japan 

出 版 物:《Journal of Intelligent Learning Systems and Applications》 (智能学习系统与应用(英文))

年 卷 期:2015年第7卷第2期

页      面:42-57页

学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:Malicious Spam Email Detection System Incremental Learning Resource Allocating Network Locality Sensitive Hashing 

摘      要:In this paper, we propose a new online system that can quickly detect malicious spam emails and adapt to the changes in the email contents and the Uniform Resource Locator (URL) links leading to malicious websites by updating the system daily. We introduce an autonomous function for a server to generate training examples, in which double-bounce emails are automatically collected and their class labels are given by a crawler-type software to analyze the website maliciousness called SPIKE. In general, since spammers use botnets to spread numerous malicious emails within a short time, such distributed spam emails often have the same or similar contents. Therefore, it is not necessary for all spam emails to be learned. To adapt to new malicious campaigns quickly, only new types of spam emails should be selected for learning and this can be realized by introducing an active learning scheme into a classifier model. For this purpose, we adopt Resource Allocating Network with Locality Sensitive Hashing (RAN-LSH) as a classifier model with a data selection function. In RAN-LSH, the same or similar spam emails that have already been learned are quickly searched for a hash table in Locally Sensitive Hashing (LSH), in which the matched similar emails located in “well-learned are discarded without being used as training data. To analyze email contents, we adopt the Bag of Words (BoW) approach and generate feature vectors whose attributes are transformed based on the normalized term frequency-inverse document frequency (TF-IDF). We use a data set of double-bounce spam emails collected at National Institute of Information and Communications Technology (NICT) in Japan from March 1st, 2013 until May 10th, 2013 to evaluate the performance of the proposed system. The results confirm that the proposed spam email detection system has capability of detecting with high detection rate.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分