Feature selection on probabilistic symbolic objects
Feature selection on probabilistic symbolic objects作者机构:Information Systems Department College of Computer and Information Sciences King Saud University Riyadh 11543 Saudi Arabia
出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))
年 卷 期:2014年第8卷第6期
页 面:933-947页
核心收录:
学科分类:0810[工学-信息与通信工程] 12[管理学] 02[经济学] 0202[经济学-应用经济学] 020208[经济学-统计学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 0808[工学-电气工程] 07[理学] 0714[理学-统计学(可授理学、经济学学位)] 070103[理学-概率论与数理统计] 0701[理学-数学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:King Saud University the College of Computer and Information Sciences
主 题:symbolic data analysis feature selection probabilistic symbolic object discrimination criteria data and knowledge visualization.
摘 要:In data analysis tasks, we are often confronted to very high dimensional data. Based on the purpose of a data analysis study, feature selection will find and select the relevant subset of features from the original features. Many feature selection algorithms have been proposed in classical data analysis, but very few in symbolic data analysis (SDA) which is an extension of the classical data analysis, since it uses rich objects instead to simple matrices. A symbolic object, compared to the data used in classical data analysis can describe not only individuals, but also most of the time a cluster of individuals. In this paper we present an unsupervised feature selection algorithm on probabilistic symbolic objects (PSOs), with the purpose of discrimination. A PSO is a symbolic object that describes a cluster of individuals by modal variables using relative frequency distribution associated with each value. This paper presents new dissimilarity measures between PSOs, which are used as feature selection criteria, and explains how to reduce the complexity of the algorithm by using the discrimination matrix.