MICkNN:Multi-Instance Covering kNN Algorithm
MICkNN: Multi-Instance Covering kNN Algorithm作者机构:Department of Computer Science and Technology and Key Lab of Intelligent Computing and Signal ProcessingAnhui University
出 版 物:《Tsinghua Science and Technology》 (清华大学学报(自然科学版(英文版))
年 卷 期:2013年第18卷第4期
页 面:360-368页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)]
基 金:the National Natural Science Foundation of China (Nos. 61073117 and 61175046) the Provincial Natural Science Research Program of Higher Education Institutions of Anhui Province (No. KJ2013A016) the Academic Innovative Research Projects of Anhui University Graduate Students (No. 10117700183) the 211 Project of Anhui University
主 题:mining ambiguous data multi instance classification constructive covering algorithm kNN algorithm
摘 要:Mining from ambiguous data is very important in data mining. This paper discusses one of the tasks for mining from ambiguous data known as multi-instance problem. In multi-instance problem, each pattern is a labeled bag that consists of a number of unlabeled instances. A bag is negative if all instances in it are negative. A bag is positive if it has at least one positive instance. Because the instances in the positive bag are not labeled, each positive bag is an ambiguous. The mining aim is to classify unseen bags. The main idea of existing multi-instance algorithms is to find true positive instances in positive bags and convert the multi-instance problem to the supervised problem, and get the labels of test bags according to predict the labels of unknown instances. In this paper, we aim at mining the multi-instance data from another point of view, i.e., excluding the false positive instances in positive bags and predicting the label of an entire unknown bag. We propose an algorithm called Multi-Instance Covering kNN (MICkNN) for mining from multi-instance data. Briefly, constructive covering algorithm is utilized to restructure the structure of the original multi-instance data at first. Then, the kNN algorithm is applied to discriminate the false positive instances. In the test stage, we label the tested bag directly according to the similarity between the unseen bag and sphere neighbors obtained from last two steps. Experimental results demonstrate the proposed algorithm is competitive with most of the state-of-the-art multi-instance methods both in classification accuracy and running time.