Soft Computing (2020) 24:4675–4691 https://doi.org/10.1007/s00500-019-04228-4
软计算 (2020)24:4675–4691 https://doi.org/10.1007/s00500-019-04228-4
A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems
一种基于模糊相似性的粗糙集方法,用于集合值信息系统中的属性选择
Shivani Singh1 • Shivam Shreevastava2 • Tanmoy Som2 • Gaurav Somani2
希瓦尼·辛格1希瓦姆·什里瓦斯塔瓦2坦莫伊·索姆2高拉夫·索马尼2
Published online: 23 July 2019
网络发布日期:2019 年 7 月 23 日
Springer-Verlag GmbH Germany, part of Springer Nature 2019
德国 Springer-VerlagGmbH,Springer Nature2019 的一部分
Abstract
抽象
Databases obtained from different search engines, market data, patients’ symptoms and behaviours, etc., are some common examples of set-valued data, in which a set of values are correlated with a single entity. In real-world data deluge, various irrelevant attributes lower the ability of experts both in speed and in predictive accuracy due to high dimension and insignificant information, respectively. Attribute selection is the concept of selecting those attributes that ideally are necessary as well as sufficient to better describe the target knowledge. Rough set-based approaches can handle uncertainty available in the real-valued information systems after the discretization process. In this paper, we introduce a novel approach for attribute selection in set-valued information system based on tolerance rough set theory. The fuzzy tolerance relation between two objects using a similarity threshold is defined. We find reducts based on the degree of dependency method for selecting best subsets of attributes in order to obtain higher knowledge from the information system. Analogous results of rough set theory are established in case of the proposed method for validation. Moreover, we present a greedy algorithm along with some illustrative examples to clearly demonstrate our approach without checking for each pair of attributes in set-valued decision systems. Examples for calculating reduct of an incomplete information system are also given by using the proposed approach. Comparisons are performed between the proposed approach and fuzzy rough- assisted attribute selection on a real benchmark dataset as well as with three existing approaches for attribute selection on six real benchmark datasets to show the supremacy of proposed work.
从不同搜索引擎获得的数据库、市场数据、患者的症状和行为等,是集值数据的一些常见例子,其中一组值与单个实体相关联。在现实世界的数据洪流中,由于高维度和不重要的信息,各种不相关的属性分别降低了专家在速度和预测准确性方面的能力。属性选择是选择那些理想情况下是必要且足以更好地描述目标知识的属性的概念。基于粗糙集合的方法可以处理离散化过程后实值信息系统中可用的不确定性。在本文中,我们介绍了一种基于公差粗糙集理论的集合值信息系统中的属性选择新方法。使用相似性阈值定义两个对象之间的模糊容差关系。我们根据依赖程度方法找到归约,用于选择最佳属性子集,以便从信息系统获得更高的知识。在所提出的验证方法的情况下,建立了粗糙集理论的类似结果。此外,我们提出了一个贪婪算法以及一些说明性示例,以清楚地演示我们的方法,而无需检查设定值决策系统中的每一对属性。使用所提出的方法还给出了计算不完全信息系统还原的示例。 在真实基准数据集上对所提出的方法和模糊粗略辅助属性选择进行了比较,并在六个真实基准数据集上与三种现有的属性选择方法进行了比较,以表明所提出工作的至高无上。
Keywords Set-valued data Rough set Fuzzy tolerance relation Degree of dependency Attribute selection
关键字设置值数据粗糙集模糊容忍关系依赖度属性选择
Introduction
介绍