Heuristic algorithms for finding distribution reducts in probabilistic rough set model
Abstract
Attribute reduction is one of the most important topics in rough set theory. Heuristic attribute reduction algorithms have been presented to solve the attribute reduction problem. It is generally known that fitness functions play a key role in developing heuristic attribute reduction algorithms. The monotonicity of fitness functions can guarantee the validity of heuristic attribute reduction algorithms. In probabilistic rough set model, distribution reducts can ensure the decision rules derived from the reducts are compatible with those derived from the original decision table. However, there are few studies on developing heuristic attribute reduction algorithms for finding distribution reducts. This is partly due to the fact that there are no monotonic fitness functions that are used to design heuristic attribute reduction algorithms in probabilistic rough set model. The main objective of this paper is to develop heuristic attribute reduction algorithms for finding distribution reducts in probabilistic rough set model. For one thing, two monotonic fitness functions are constructed, from which equivalence definitions of distribution reducts can be obtained. For another, two modified monotonic fitness functions are proposed to evaluate the significance of attributes more effectively. On this basis, two heuristic attribute reduction algorithms for finding distribution reducts are developed based on additiondeletion method and deletion method. In particular, the monotonicity of fitness functions guarantees the rationality of the proposed heuristic attribute reduction algorithms. Results of experimental analysis are included to quantify the effectiveness of the proposed fitness functions and distribution reducts.
keywords:
Attribute reduction; heuristic attribute reduction algorithm; the significance of attribute; probabilistic rough set model; distribution reduct.1 Introduction
Rough set theory, introduced by Pawlak Pawlak1982RS (), is a valid mathematical theory that deals well with imprecise, vague and uncertain information, and it has become an area of active research spreading throughout many fields, such as machine learning, data mining, knowledge discovery, intelligent data analyzing Dai2012UMIDTIA (); Herawan2010RSASCA (); Herbert2009CCRSM (); Lin2013UVPCRSGA (); Pedrycz2013GCADIS (); Jensen2007RTEA (). In rough set theory, attribute reduction is a popular task to select the essential attributes that preserve or improve a certain classification property as the entire set of available attributes. Hence, an attribute reduct can be defined as a minimal attribute set that can preserve or improve specific classification criterion of a given information system. Attribute reduction is often helpful to reduce the computational cost, save storage space, improve learning performance and prevent overfitting Thangavela2009DRRSTR (). Studies on attribute reduction in rough set theory can be mainly divided into two categories. The first category concentrates on the study of definition of attribute reduct. The second category focuses on the study of attribute reduction algorithm.
For definition of attribute reduct, one mainly emphasizes on selecting what kinds of properties of a given information system to keep unchanged or to improve Jia2015GARRST (); Min2011TCSAR (); Min2012ARDERTC (). For example, Pawlak Pawlak1991RSTARD () defined a relative reduct that keeps the quality of classification or classification positive region unchanged. Miao et al. Miao1999HARK () constructed the mutual information based reducts to extract relevant attribute sets that preserve the mutual information of a given decision table. Kryszkiewicz Kryszkiewicz2001CSATKRIS (); Kryszkiewicz2007CGDMDRVFDIS () investigated and compared the relationships among possible reduct, approximate reduct, reduct, decision reduct and generalized decision reduct. Zhang et al. Zhang2003AKRIS () introduced the maximum distribution reduct, which preserves all maximum decision rules in a decision table. Deng et al. Deng2011FSDSBCKG () proposed the notions of conditional knowledge granularity to reflect the relationship between conditional attributes and decision attribute, and defined an attribute reduct based on conditional knowledge granularity. Jiang et al. Jiang2015RDEBFSA () proposed a new model of relative decision entropy by combining roughness with the degree of dependency, and used it as the reduction criterion.
For attribute reduction algorithm, one mainly focuses on designing the effective reduct construction methods for finding a specific type of reduct Shu2015IAARDIDSRST (); Teng2016EARVD (); Zheng2014EHARARS (). According to the different methods, research efforts on attribute reduction algorithm can be divided into three categories. The first class of methods is based on discernibility matrix. Miao et al. Miao2009RRCIDTPRSM () discussed the structures of discernibility matrices for three different reducts, including region preservation reduct, decision preservation reduct and relationship preservation reduct. Yao and Zhao Yao2009DMSCAR () introduced a reduct construction method based on discernibility matrix simplification. The second class of methods is based on heuristics. Qian et al. Qian2010PAAARRST () proposed the concept of positive approximation for accelerating heuristic attribute reduction algorithms. Parthalain et al. Parthalain2009EBRTRSFS (); Parthalain2010DMAERSBRAR () proposed a distance measurebased attribute reduction algorithm by considering the proximity of objects in the boundary region to those in the lower approximation. The third class of methods is based on stochastic optimization. Chen et al. Chen2010RSAFSBACO () proposed a novel attribute reduction algorithm based on ant colony optimization for finding a reduct that keeps the mutual information unchanged. Ye et al. Ye2013NBFERSMARP () introduced a novel fitness function for designing the particle swarm optimizationbased and geneticbased attribute reduction algorithms.
In particular, the studies mentioned above are mainly focused on attribute reduction in classical rough set model. Relative to the classical rough set model, probabilistic rough set model allows certain acceptable levels of errors by using the threshold parameters Dou2015DTRSMS (); Liu2012PMCDTRS (); Ma2012PRSOTURE (); Wang2015GVPFRSGFR (). Hence, the probabilistic rough set model can effectively deal with data sets which have the noisy and uncertain data. By setting the different threshold parameters, one can derive many existing probabilistic rough set models, such as 0.5 probabilistic rough set model Pawlak1988RSPVDA (), decisiontheoretic rough set model Yao1990ADTRS (), variable precision rough set model Ziarko1993VPRS (), Bayesian rough set model Slezak2005IBRS () and game rough set model Herbert2011GTRS (). Although attribute reduction in probabilistic rough set model has gained considerable attention in recently, these works generally focus on the study of definition of attribute reduct Inuiguch2005SAARVPRSM (); Jia2013MCARDTRSM (); Li2011NMARDTRS (); Mi2004AKRVPRS (); Wang2009RRFVPRS (); Yao2008ARDTRS (); Zhang2014RBQHARTCDTRSM (); Zhao2011NARDTRS (); Ziarko1993VPRS (). For attribute reduction algorithm, the research efforts mainly concentrate on the discernibility matrix based methods Inuiguch2005SAARVPRSM (); Mi2004AKRVPRS (); Yang2014NAARVPRSM (); Zhao2011NARDTRS () and stochastic optimization based methods Jia2014OORDTRSM (); Liu2014SCBAFRVPRS (); Chebrolu2011ARDTRSMUGA (). Few attempts have been made to study the heuristic attribute reduction algorithms in probabilistic rough set model. This is partly because the attribute reduction in probabilistic rough set model becomes more complex after introducing the threshold parameters.
This paper concentrates on constructing heuristic attribute reduction algorithms for finding low and upper distribution reducts in probabilistic rough set model, which are defined by Mi et al. in Mi2004AKRVPRS (). Generally speaking, the most common heuristic attribute reduction algorithms are the additiondeletion method and the deletion method Yao2008RCA (). The additiondeletion method starts from an empty set or the core, then it adds attributes one by one on the basis of the significance of attributes until a stopping criterion is reached. On the contrary, the deletion method starts with a set containing all the attributes, then it delete the attributes one by one according to the significance of attributes. In fact, the heuristic attribute reduction algorithms mainly included two aspects: the stopping criterion and the significance measures of attributes. The stopping criterion is implemented by checking the jointly sufficient condition in the definition of attribute reduct, and the significance measures of attributes is used to rank the attributes. As we known, the fitness functions play a nontrivial role in designing stopping criteria and constructing significance measures of attributes. Furthermore, the monotonicity of fitness functions is very important to guarantee the validity of heuristic attribute reduction algorithms. However, very little work have considered the monotonicity of fitness functions in attribute reduction of probabilistic rough set model. In this paper, we first construct two monotonic fitness functions in probabilistic rough set model. The equivalent definitions of the low and upper distribution reducts are obtained based on the constructed monotonic fitness functions. Then we use them to design the stopping criterion of heuristic attribute reduction algorithms. The monotonicity of fitness functions guarantees the validity of heuristic attribute reduction algorithms. In addition, we propose two modified monotonic fitness functions to evaluate the significance of attributes more effectively. On this basis, we develop two heuristic attribute reduction algorithms for finding the low and upper distribution reducts based on additiondeletion method and deletion method. In the end, some experimental analyses are covered to validate the effectiveness of the proposed fitness functions and distribution reducts.
The remaining sections of this paper are organized as follows. Section 2 briefly reviews some basic notions related to the definition of distribution reducts and heuristic attribute reduction algorithms. Section 3 develops two heuristic attribute reduction algorithms for finding the low and upper distribution reducts in probabilistic rough set model by constructing the monotonic fitness functions. Several experiments are given to illustrate the effectiveness of the constructed fitness functions and distribution reducts in Section 4. Section 5 concludes and offers suggestions for future research.
2 Preliminary knowledge
In this section, we recall the basic notions related to attribute reduction on rough set theory. Liu2012MCDTRS (); Yao1990ADTRS ().
2.1 The definition of distribution reducts
An information system is a fourtuple , where is a finite nonempty set of objects called universe, is a nonempty finite set of attributes, , where is a nonempty set of values of attribute , called the domain of , is a mapping that maps an object in to exactly one value in such that , , . For brevity, can be written as .
For any subset of attributes , an indiscernibility relation on is defined as:
It can be easily shown that is an equivalence relation on . For , the equivalence relation partitions into some equivalence classes denoted by , for simplicity, will be replaced by , where is an equivalence class determined by with respect to , i.e., .
To describe a concept, rough set theory introduces a pair of lower approximation and upper approximation as follows.
Definition 1.
Given an information system , and , the lower approximation and upper approximation of with respect to are defined as:
The lower approximation of is a set of objects that belong to with certainty, and the upper approximation of is a set of objects that possibly belong to .
Given an information system , , one can define a partial relation on as follows Dai2012AUMIIS ():
If , Q is said to be coarser than P (or P is finer than Q). If and , Q is said to be strictly coarser than P or P is strictly finer than Q, denoted by . In fact, , we have that and there exists such that .
A decision table is a fourtuple , where is condition attribute set, is decision attribute set, and , is the union of attribute domain, . can be written as more simply.
Definition 2.
Liang2004IEREKGRST () Given a decision table , and is a classification of the universe . The positive region of with respect to is defined as follows:
Pawlak rough set model does not allow any tolerance of errors. The probabilistic rough set model, which is a main extension of Pawlak rough set model, shows certain levels of tolerance for errors.
Definition 3.
Given an information system , for any , and . The probabilistic lower approximation and probabilistic upper approximation of with respect to are defined as follows:
where .
Definition 4.
Liang2004IEREKGRST () Given a decision table , for any , and is a classification of the universe . The probabilistic positive region of with respect to is defined as follows:
Attribute reduction is one of the most important topics in rough set theory. An attribute reduct is defined as a subset of attributes that are jointly sufficient and individually necessary for preserving or improving a particular property of the given information system Yao2008ARDTRS (). A general definition of an attribute reduct is given as follows.
Definition 5.
Zhao2007GDAR () Given an information system , and consider a certain property , which can be represented by an evaluation function , of . An attribute set is called a reduct of if it satisfies the following two conditions:

Jointly sufficient condition: ,

Individually necessary condition: for any , .
An evaluation or fitness function, , maps an attribute set to an element of a poset equipped with the partial order relation i.e., is reflexive, antisymmetric and transitive. For a certain property , various fitness functions can be used to evaluate the degree of satisfiability of the property by an attribute set. Generally, the fitness function is not unique. The jointly sufficient condition guarantees that the evaluation of the reduct with respect to is the same or superior to , and it has in many cases. The individually necessary condition guarantees that the reduct is minimal, namely, there is no redundant or superfluous attribute in the reduct .
According to the different properties of an information system, the different reducts can be defined. There are a huge amount of known properties, such as a description of an object relation Pawlak1982RS (), partitions of an information system Zhao2007GDAR (), a classification of a set of concepts Pawlak1991RSTARD (), where the classification of a set of concepts is the most common property in rough set theory.
In classical rough set model, the classification of a set of concepts can be evaluated by using the positive region of the classification (Definition 2). In probabilistic rough set model, the classification of a set of concepts can be evaluated by using the probabilistic positive region of the classification (Definition 4). However, the decision rules derived from the reduct preserving probabilistic positive region maybe in conflict with those derived from the original decision table because of nonmonotonicity of probabilistic positive region with respect to the set inclusion of attribute sets Mi2004AKRVPRS ().
To derive the conflict free decision rules, Mi et al. Mi2004AKRVPRS () presented the concepts of distribution reducts based on variable precision rough set model which is a typical probabilistic rough set model.
Definition 6.
Given a decision table , for any , and is a classification of the universe . The lower and upper approximation distribution functions with respect to are defined as:
The lower or upper approximation distribution functions can be seen as fitness functions for evaluating the classification of a set of concepts in probabilistic rough set model. By the lower and upper approximation distribution functions, the lower and upper approximation distribution reducts based on probabilistic rough set model are defined as follows.
Definition 7.
Mi2004AKRVPRS () Given a decision table , for any and , we have

If , is referred to as an lower distribution consistent set of ; if and for all , then is referred to as an lower distribution reduct of .

If , is referred to as an upper distribution consistent set of ; if and for all , then is referred to as an upper distribution reduct of .
An lower (upper) distribution reduct is a minimal subset of attribute set that preserves the lower (upper) approximations of all decision classes. For the sake of the simplicity, the lower and upper distribution reducts are collectively called distribution reducts in the rest of this paper.
It is important to note that the monotonicity property of the fitness function used in the definition of attribute reduct with respect to the set inclusion of attribute sets should receive enough attention when we design the heuristic attribute reduction algorithms. If the fitness function is monotonic regarding the set inclusion of attribute sets, individually necessary condition only need to consider the subsets for all to guarantee a reduct is minimal. If the fitness function is not monotonic regarding the set inclusion of attribute sets, individually necessary condition must consider all subsets of a reduct to make sure it is a minimal set Jia2013MCARDTRSM (); Zhao2011NARDTRS ().
In Definition 7, individually necessary conditions must consider all subsets of the reduct because the lower and upper approximation distribution functions are not monotonic regarding the set inclusion of attribute sets, which will complicate the algorithm design.
2.2 Typical heuristic attribute reduction algorithms
So far, Yao et al. Yao2008RCA () have summarized three groups of heuristic attribute reduction algorithms based on the additiondeletion method, the deletion method and the addition method, where the additiondeletion method based algorithm and the deletion method based algorithm are two most widely used heuristic attribute reduction algorithms by the rough set community. Hence, we mainly discuss the first two methods based heuristic attribute reduction algorithms in this subsection.
The additiondeletion method based algorithm and the deletion method based algorithm are displayed in Algorithm 1 and Algorithm 2, respectively.
Algorithm 1 starts with an empty set or the core, and consequently adds attributes to the subset of selected attributes until a candidate reduct satisfies the jointly sufficient condition in the definition of attribute reduct. Each selected attribute maximizes the increment of fitness values of the current attribute subset. One needs the deleting process to delete the superfluous attributes in the candidate reduct one by one after the addition process because the addition process may add the superfluous attributes.
Algorithm 2 takes the entire condition attributes as a candidate reduct, then selects the attributes for deleting one by one according to the fitness values. If the subset of the remaining attributes satisfies the jointly sufficient condition in the definition of attribute reduct after deleting the selected attribute, then the attribute is the superfluous attribute and can be deleted. A reduct is obtained if and only if each attribute has been checked once.
Algorithm 1 and Algorithm 2 mainly consist of two key steps: checking the jointly sufficient condition and evaluating the significance of attributes. Checking the jointly sufficient condition is to ensure that the candidate reduct meet jointly sufficient condition in the definition of attribute reduct. Evaluating the significance of attributes is to sort attributes and provide the heuristic information for searching a reduct, and the step can be implemented by designing the effective fitness functions. Different fitness functions may get the different orders of attributes, that may obtain the different reducts.
It is worth pointing out that the monotonicity of the fitness function in Definition 5 with respect to the set inclusion of attribute sets is very important for the completeness of Algorithm 1 and Algorithm 2 Ma2014DRDPRDTRS (). If the fitness function is monotonic regarding the set inclusion of attribute sets, Algorithm 1 and Algorithm 2 must obtain a reduct, namely, Algorithm 1 and Algorithm 2 are complete. If the fitness function is not monotonic regarding the set inclusion of attribute sets, Algorithm 1 and Algorithm 2 may obtain a super reduct that includes redundancy attributes, namely, Algorithm 1 and Algorithm 2 are incomplete. Moreover, the fitness functions for evaluating the significance of attributes should also satisfy the monotonicity with respect to the set inclusion of attribute sets and provide enough precision to sort the attributes more effectively Jiang2015RDEBFSA (); Li2015CIMACDT ().
3 Heuristic algorithms to find distribution reducts in probabilistic rough set model
Checking the jointly sufficient condition and evaluating the significance of attributes are two key steps in heuristic attribute reduction algorithms based on the additiondeletion strategy and the deletion strategy. Furthermore, the monotonicity of the fitness functions for checking the jointly sufficient condition and evaluating the significance of attributes is very important for the validity of the heuristic attribute reduction algorithms. To obtain the distribution reducts with heuristic attribute reduction algorithms, in this section, we first construct two monotonic fitness functions. Then we give the equivalent definition of distribution reducts based on the monotonic fitness functions constructed. After that, we further proposed two significance measures of attributes by multiplying measures of granularity of partitions by the monotonic fitness functions constructed. Moreover, the core and core computation algorithm for distribution reducts are also presented. On this basis, two heuristic attribute reduction algorithms to find distribution reducts are developed based on additiondeletion method and deletion method. Finally, an illustrative example to the heuristic attribute reduction algorithms proposed is provided step by step.
3.1 The equivalent definition of distribution reducts with monotonic fitness functions
For a certain property in the definition of attribution reduct, the different fitness functions can be used as its indicator. The monotonicity of the fitness functions is very important to guarantee the completeness of heuristic attribute reduction algorithms. To develop the complete heuristic attribute reduction algorithms to obtain distribution reducts, we construct two monotonic fitness functions in this subsection. The equivalent definition of distribution reducts are further given based on the monotonic fitness functions constructed.
Now, let us first give two monotonic fitness functions as follow.
Definition 8.
Given a decision table , for any , and is a classification of the universe , we denote
If , then define and . Moreover, and are denoted by and respectively if there is no confusion arisen.
It is easy to see from above that and are defined by using the lower approximations of all elements in with respect and upper approximations of all elements in with respect to , respectively. represents the change of certain knowledge with respect to after removing attributes from the decision table, and represents the change of relevant knowledge with respect to after removing attributes from the decision table. Therefore, represents the change of certain knowledge with respect to . represents the change of relevant knowledge with respect to . Hence, we can use and to compute the lower and upper distribution reducts, respectively.
Theorem 1.
Given a decision table , for any and , we have

,

.
Proof.
Suppose . In terms of the definition of , we have . Let and .
On the one hand, for , one can obtain , hence , then we obtain that , thus .
In the other hand, for , we have . Since , , we have . It follows that .
As a result, we have
Thus,
Hence,
This completes the proof.
The proof of is similar to that of .
By Theorem 1 we immediately get the following corollary.
Corollary 0.
Given a decision table , for any and , we have

,

.
Theorem 1 and Corollary 2 show that the fitness function increases and the fitness function decreases as the equivalence classes become smaller through finer partitioning, which means that adding a new attribute into the existing subset of condition attributes at least does not decrease or increase , and that deleting an attribute from the existing subset of condition attributes at least does not increases or decreases . The property is very important for constructing heuristic attribute reduction algorithms.
In the following, the performance of Theorem 1 is shown through an illustrative example.
Example 1.
Given a decision table showed in Table 1, where , and . Suppose that and , , where and . As we can see, , which means .
1  1  1  1  0  0  0  
0  1  1  0  1  0  0  
0  1  1  1  0  0  0  
0  1  1  1  0  0  1  
0  0  0  1  0  1  0  
0  0  1  1  1  1  1  
0  0  1  1  0  1  1  
1  1  0  0  1  1  0  
1  1  0  0  1  1  1  
1  1  0  0  0  0  1  
1  1  0  0  0  0  1 
By calculating, one can have
Hence, we have
It can be easily calculated that
Obviously, .
According to Definition 8, we have
Thus, and . It is clear that increases and decreases with becoming finer.
Now, let us give equivalence descriptions for each distribution consistent set by the fitness functions and .
Theorem 3.
Given a decision table , for any and , we have

is an lower distribution consistent set of iff ,

is an upper distribution consistent set of iff .
Proof.
Since , it is easy to verify that forms a partition of . Let .
Since is an lower distribution consistent set, we have , therefore for , we obtain that .
Hence,
For , we have .
Suppose there exists such that .
As a result,
It conflicts with condition . Hence, for , we have .
If , there are two cases in which and .