Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks

# Reliable Uncertain Evidence Modeling in Bayesian Networks by Credal Networks

Sabina Marchetti
Sapienza University of Rome
Rome (Italy)
sabina.marchetti@uniroma1.it
Alessandro Antonucci
IDSIA
Lugano (Switzerland)
alessandro@idsia.ch
August 20, 2019
###### Abstract

A reliable modeling of uncertain evidence in Bayesian networks based on a set-valued quantification is proposed. Both soft and virtual evidences are considered. We show that evidence propagation in this setup can be reduced to standard updating in an augmented credal network, equivalent to a set of consistent Bayesian networks. A characterization of the computational complexity for this task is derived together with an efficient exact procedure for a subclass of instances. In the case of multiple uncertain evidences over the same variable, the proposed procedure can provide a set-valued version of the geometric approach to opinion pooling.

## 1 Introduction

Knowledge-based systems are used in AI to model relations among the variables of interest for a particular task, and provide automatic decision support by inference algorithms. This can be achieved by joint probability mass functions. When a subset of variables is observed, belief updating is a typical inference task that propagates such (fully reliable) evidence. Whenever the observational process is unable to clearly report a single state for the observed variable, we refer to uncertain evidence. This might take the form of a virtual instance, described by the relative likelihoods for the possible observation of every state of a considered variable [25]. Also, soft evidence [30] denotes any observational process returning a probabilistic assessment, whose propagation induces a revision of the original model [21]. Bayesian networks are often used to specify joint probability mass functions implementing knowledge-based systems [22]. Full, or hard [30], observation of a node corresponds to its instantiation in the network, followed by belief updating. Given virtual evidence on some variable, the observational process can be modeled à la Pearl in Bayesian networks: an auxiliary binary child of the variable is introduced, whose conditional mass functions are proportional to the likelihoods [25]. Instantiation of the auxiliary node yields propagation of virtual evidence, and standard inference algorithms for Bayesian networks can be used [22]. Something similar can be done with soft evidence, but the quantification of the auxiliary node should be based on additional inferences in the original network [9].

In the above classical setup, sharp probabilistic estimates are assumed for the parameters modeling an uncertain observation. We propose instead a generalized set-valued quantification, with interval-valued likelihoods for virtual evidence and sets of marginal mass functions for soft evidence. This offers a more robust modeling of observational processes leading to uncertain evidence. To this purpose, we extend the transformations defined for the standard case to the set-valued case. The original Bayesian network is converted into a credal network [12], equivalent to a set of Bayesian networks consistent with the set-valued specification. We characterize the computational complexity of the credal modeling of uncertain evidence in Bayesian networks, and propose an efficient inference scheme for a special class of instances. The discussion is indeed specialized to opinion pooling and our techniques used to generalize geometric functionals to support set-valued opinions.

### 1.1 Related Work

Model revision based on uncertain evidence is a classical topic in AI. Entropy-based techniques for the absorption of uncertain evidence were proposed in the Bayesian networks literature [30, 26], as well as for the pooling of convex sets of probability mass functions [1]. Yet, this approach was proved to fail standard postulates for revision operators in generalized settings [20]. Uncertain evidence absorption has been also considered in the framework of generalized knowledge representation and reasoning [17]. The discussion was specialized to evidence theory [32, 23], although revision based on uncertain instances with graphical models becomes more problematic and does not give a direct extension of the Bayesian networks formalism [28]. Finally, credal networks have been considered in the model revision framework [13]. Yet, these authors consider the effect of a sharp quantification of the observation in a previously specified credal network, while we consider the opposite situation of a Bayesian network for which credal uncertain evidence is provided.

## 2 Background

### 2.1 Bayesian and Credal Networks

Let be any discrete variable. Notation and is used, respectively, for a generic value and for the finite set of possible values of . If is binary, we set . We denote as a probability mass function (PMF) and as a credal set (CS), defined as a set of PMFs over . We remove inner points from CSs, i.e. those which can be obtained as convex combinations of other points, and assume the CS finite after this operation. CS , whose convex hull includes all PMFs over is called vacuous.

Given another variable , define a collection of conditional PMFs as . is called conditional probability table (CPT). Similarly, a credal CPT (CCPT) is defined as . An extensive CPT (ECPT) is a finite collection of CPTs. A CCPT can be converted into an equivalent ECPT by considering all the possible combinations from the elements of the CSs.

Given a joint variable , a Bayesian network (BN) [25] serves as a compact way to specify a PMF over . A BN is represented by a directed acyclic graph , whose nodes are in one-to-one correspondence with the variables in , and a collection of CPTs , where is the joint variable of the parents of according to . Under the Markov condition, i.e. each variable is conditionally independent of its non-descendants non-parents given its parents, the joint PMF factorizes as , where the values of and are those consistent with , for each .

A credal network (CN) [12] is a BN whose CPTs are replaced by CCPTs (or ECPTs). A CN specifies a joint CS , obtained by considering all the joint PMFs induced by the BNs with CPTs in the corresponding CCPTs (or ECPTs).

The typical inference task in BNs is updating, defined as the computation of the posterior probabilities for a variable of interest given hard evidence about some other variables. Without loss of generality, let the variable of interest and the observation be, respectively, and . Standard belief updating corresponds to:

 P(x0|xn)=∑x1,…,xn−1∏ni=0P(xi|πi)∑x0,x1,…,xn−1∏ni=0P(xi|πi). (1)

Updating is NP-hard in general BNs [11], although efficient computations can be performed in polytrees [25] by message propagation routines [22].

CN updating is similarly intended as the computation of lower and upper bounds of the updated probability in Eq. (1) with respect to . Notation () is used to denote lower (upper) bounds. CN updating extends BN updating and it is therefore NP-hard [14]. Contrary to the standard setting, inference in generic polytrees is still NP-hard [24], with the notable exception of those networks whose variables are all binary [18].

### 2.2 Virtual and Soft Evidence

Eq. (1) gives the updated beliefs about queried variable . The underlying assumption is that has been the subject of a fully reliable observational process, and its actual value is known to be . This is not always realistic. Evidence might result from a process which is unreliable and only the likelihoods for the possible values of the observed variable may be assessed (e.g., the precision and the false discovery rate for a positive medical test). Virtual evidence (VE) [25] applies to such type of observation. Notation identifies a VE, being the likelihood of the observation provided . Given VE, the analogous of Eq. (1) is:

 PλXn(x0):=∑xnλxnP(x0,xn)∑xnλxnP(xn), (2)

where the probabilities in the right-hand side are obtained by marginalization of the joint PMF of the BN. Eq. (2) can be equivalently obtained by augmenting the BN with auxiliary binary node as a child of . By specifying for each , it is easy to check that , i.e. Eq. (2) can be reduced to a standard updating in an augmented BN.

The notion of soft evidence (SE) refers to a different situation, in which the observational process returns an elicitation for the marginal PMF of . See [5] for a detailed discussion on the possible situations producing SE. If this is the case, is assumed to replace the original beliefs about by Jeffrey’s updating [21], i.e.

 P′Xn(x0):=∑xnP(x0|xn)⋅P′(xn). (3)

Eq. (3) for SE reduces to Eq. (1) whenever assigns all the probability mass to a single value in . The same happens for VE in Eq. (2), when all the likelihoods are zero apart from the one corresponding to the observed value. Although SE and VE refer to epistemologically different informational settings, the following result provides means for a unified approach to their modeling.

###### Proposition 1 ([9]).

Absorption of a SE as in Eq. (3) is equivalent to Eq. (2) with a VE specified as:

 λxn∝P′(xn)P(xn), (4)

for each .111VE is defined as a collection of likelihoods, which in turn are defined up to a multiplicative positive constant. This clearly follows from Eq. (2). The relation in Eq. (4) is proportionality and not equality just to make all the likelihoods smaller or equal than one.

Vice versa, absorption of a VE as in Eq. (2) is equivalent to Eq. (3) with a SE specified as:

 P′(xn):=λxnP(xn)∑xnλxnP(xn), (5)

for each .

In the above setup for SE, states that are impossible in the original BN cannot be revised, i.e. if for some , then also and any value can be set for . Vice versa, according to Eq. (5), a zero likelihood in a VE renders impossible the corresponding state of the SE. Thus, at least a non-zero likelihood should be specified in a VE. All these issues are shown in the following example.

###### Example 1.

Let denote the actual color of a traffic light with . Assume (green) more probable than (red), and (yellow) impossible. Thus, for instance, . We eventually revise by a SE , which keeps yellow impossible and assigns the same probability to the two other states, i.e. . Because of Eq. (4), this can be equivalently achieved by a VE . Vice versa, because of Eq. (5), a VE induces an updated . Such PMF coincides with in a two-node BN, with child of , CPT with and marginal PMF as in the original specification.

## 3 Credal Uncertain Evidence

### 3.1 Credal Virtual Evidence

We propose credal VE (CVE) as a robust extension of sharp virtual observations. Notation is used here for the intervals . CVE updating is defined as the computation of the bounds of Eq. (2) with respect to all VEs consistent with the interval constraints in . Notation and is used to denote these bounds. CVE absorption in BNs is done as follows.

###### Transformation 1.

Given a BN over and a CVE , add a binary child of and quantify its CCPT with constraints .222For binary , constraint defines a CS with elements and . A CN with a single credal node results.

By Tr. 1, CVE updating in a BN is reduced to CN updating.

###### Theorem 1.

Given a CVE in a BN, consider the CN returned by Tr. 1. Then:

 P––(x0|dXn)=P––ΛXn(x0), (6)

and analogously for the upper bounds.

Standard VE can be used to model partially reliable sensors or tests, whose quantification is based on sensitivity and specificity data. Since these data are not always promptly/easily available (e.g., a pregnancy test whose failure can be only decided later), a CVE with interval likelihoods can be quantified by the imprecise Dirichlet model333Given observations of , if of them reports , the lower bound of for to the imprecise Dirichlet model is , and the upper bound , with effective prior sample size. [6] as in the following example.

###### Example 2.

The reference standard for diagnosis of anterior cruciate legament sprains is arthroscopy. In a trial, 40 patients coming in with acute knee pain are examined using the Declan test [10]. Every patient also has an arthroscopy procedure for a definitive diagnosis. Results are TP=17 (Declan positive, arthroscopy positive), FP=3 (Declan positive, arthroscopy negative), FN=6 (Declan negative, arthroscopy positive) and TN=14 (Declan negative, arthroscopy negative). Patients visiting a clinic have prior sprain probability . Given a positive Declan, the imprecise Dirichlet model (see Footnote 3) with corresponds to CVE , , , . The bounds of the updated sprain probability with respect to the above constraints are , . A VE with frequentist estimates would have produced instead .

### 3.2 Credal Soft Evidence

Analogous to CVE, credal soft evidence (CSE) on can be specified by any CS . Accordingly, CSE updating computes the bounds spanned by the updating of all SEs based on PMFs consistent with the CS, i.e.

 P––′Xn(x0):=minP′(Xn)∈K′(Xn)∑xnP(x0|xn)⋅P′(xn), (7)

and analogously for the upper bound .

The shadow of a CS is a CS obtained from all the PMFs such that, for each :

 minP(X)∈K(X)P(x)≤^P(x)≤maxP(X)∈K(X)P(x). (8)

A CS coinciding with its shadow is called shady. It is a trivial exercise to check that CSs over binary variables are shady. 444Following [8], a shadow is just the set of probability intervals induced by a generic CS.

The following result extends Pr. 1 to the imprecise framework.

###### Theorem 2.

Absorption of a CSE with shady is equivalent to that of CVE such that:

 λ––xn∝P––′(xn)P(xn), (9)

where and analogously for the upper bound. Vice versa absorption of a CVE is equivalent to that of a CSE such that:

 P––′(xn)=P(xn)λ––xnP(xn)λ––xn+∑x′n≠xnP(x′n)¯¯¯λx′n, (10)

and analogously with a swap between lower and upper likelihoods for the upper bound.

By Th. 1 and  2, CSE updating in a BN is reduced to standard updating in a CN. This represents a generalization to the credal case of Pr. 1. For CSEs with non-shady CSs, the procedure is slightly more involved, as detailed by the following result.

###### Proposition 2.

Given a CSE in a BN, add a binary child of quantified by an ECPT such that for each and . Then:

 P––′Xn(x0)=P––(x0|dXn). (11)

To clarify these results, consider the following example.

###### Example 3.

Consider the same setup as in Ex. 1. Let us revise the original PMF by a CSE based on the shady CS , with and . Th. 2 can be used to convert such CSE in a CVE . Vice versa, the beliefs induced by CVE are , , , and , . These bounds may be equivalently obtained in a two-node CN with child of and CCPT such that , , and . Alternatively, following Pr. 2, absorption of can be achieved by a ECCPT with two CPTs.

We point out that conservative updating (CU), a credal updating rule for reliable treatment of missing non-MAR data [15], falls as a special case in our formalism. CU is defined as:

 P––′Xn(x0)=minxn∈ΩXnP(x0|xn), (12)

and represents the most conservative approach to belief revision. A vacuous CCPT is specified, with intervals for each value, either i) by Tr. 1, given CVE whose likelihoods take any value between zero and one 555As VE likelihoods are defined up to a positive multiplicative constant, we can set any positive provided that ., or ii) by straightforward application of Th. 2, if a vacuous CSE is provided. The resulting ECPT with CPTs666The induced ECPT contains all combinations of zero and ones in the CPTs. Yet, only those having a single one in the row associated to remains after the convex hull. corresponds to the CU implementation in [3]. Also, Eq. (7) reduces to Eq. (12), given vacuous CSE. We can similarly proceed in the case of incomplete observations, i.e. some values of are recognized as impossible, but no information can be provided about the other ones. If this is the case, we just replace with .

## 4 Credal Probability Kinematics

Given two joint PMFs and , we say that the latter comes from the first by probability kinematics (PK) on the (coarse) partition of induced by if and only if for each and [16, 9].777Full consistency of with the evidence inducing the revision process is not explicitly required. A more stringent characterization of PK was proposed, among others, by [31] This is the underlying assumption in Eq. (3). If is replaced by a CS, PK is generalized as follows.

###### Definition 1.

Let and be, respectively, a joint PMF and a joint CS. We say that comes from by credal probability kinematics (CPK) on the partition of induced by if and only if it holds , for each and .

That is, any revision process based on (generalized) PK guarantees invariance of the relevance of , for each , to any other possible event in the model, say . The following consistency result holds for CSEs.

###### Theorem 3.

Given a BN over and a shady CSE , convert the CSE into a CVE as in Th. 2 and transform the BN into a CN by Tr. 1. Let be the joint CS associated to the CN. Then, comes from by CPK on the partition induced by . Moreover coincides with the marginal CS in the CN.

## 5 Multiple Evidences

So far, we only considered the updating of a single CVE or CSE. We call uncertain credal updating (UCU) of a BN the general task of computing updated/revised beliefs in a BN with an arbitrary number of CSEs, CVEs, and hard evidences as well. Here, UCU is intended as iterated application of the procedures outlined above. See for instance [17], for a categorization of iterated belief revision problems and their assumptions. When coping with multiple VEs in a BN, it is sufficient to add the necessary auxiliary children to the observed variables and quantify the CPTs as described. We similarly proceed with multiple CVEs.

The procedure becomes less straightforward when coping with multiple SEs or CSEs, since quantification of each auxiliary child by Eq. (4) requires a preliminary inference step. As a consequence, iterated revision might be not invariant with respect to the revision process scheme [31].

Additionally, with CSEs, absorption of the first CSE transforms the BN into a CN, and successive absorption of other CSEs requires further extension of the procedure in Th. 2. We leave such an extension as future work, and here we just consider simultaneous absorption of all evidences. If this is the case, multiple CSEs can be converted in CVEs and the inferences required for the quantification of the auxiliary children is performed in the original BN.

### 5.1 Algorithmic and Complexity Issues

ApproxLP [2] is an algorithm for general CN updating based on linear programming. It provides an inner approximation of the updated intervals with the same complexity of a BN inference on the same graph. Roughly, CN updating is reduced by ApproxLP to a sequence of linear programming tasks. Each is obtained by iteratively fixing all the local models to single elements of the corresponding CSs, while leaving a free single variable. It follows the algorithm efficiently produces exact inferences whenever a CN has all local CSs made of a single element apart from one. This is the case of belief updating with a single CVE/CSE.

### 5.2 Complexity Issues

Since standard BN updating of polytrees can be performed efficiently, the same happens with VEs and/or SEs, as Tr. 1 does not affect the topology (nor the treewidth) of the original network. Similarly, with multiply connected models, BN updating is exponential in the treewidth, and the same happens with models augmented by VEs and/or SEs.

As already noticed, with CNs, binary polytrees can be updated efficiently, while updating ternary polytrees is already NP-hard. An important question is therefore whether or not a similar situation holds for UCU in BNs. The (positive) answer is provided by the two following results.

###### Proposition 3.

UCU of polytree-shaped binary BNs can be solved in polynomial time.

The proof of this proposition is trivial and simply follows from the fact that the auxiliary nodes required to model CVE and/or CSE are binary (remember that CSs over binary variables are always shady). The CN solving the UCU is therefore a binary polytree that can be updated by the exact algorithm proposed in [18].

###### Theorem 4.

UCU of non-binary polytree-shaped BNs is NP-hard.

The proof of this theorem is based on a reduction to the analogous result for CNs [24]. This already concerns models whose variables have no more than three states and treewidth equal to two. In these cases, approximate inferences can be efficiently computed by ApproxLP.

## 6 Credal Opinion Pooling

Consider the generalized case of overlapping probabilistic instances on . For each , let denote the SE reported by the -th source. Straightforward introduction of auxiliary nodes as outlined above would suffer confirmational dynamics, analogous to the well-known issue with posterior probability estimates in the naive Bayes classifier [27]. This might likely yield inconsistent revised beliefs, i.e. falls outside the convex hull of .

A most conservative approach to prevent such inconsistency adopts the convex hull of all the opinions [29]. In our formalism, this is just the CS . Yet, consider any small , and assume , , and for each . Despite the consensus of all remaining sources on sharp value , the conservative approach above would yield . To what extent should this be preferred to the confirmational case is an open question.

A compromise solution might be offered by the geometric pooling operator (or LogOp) [4]. Given a collection of positive weights , with , the LogOp functional produces the PMF such that:

 ~P′(xn)∝m∏j=1P′j(x)αj, (13)

for each . belongs to the convex hull of for any specification of the weights [1]. The overlapping SEs associated to the PMF in Eq. (13) can be equivalently modeled by a collection of VEs defined as follows.

###### Transformation 2.

Consider a BN over and a collection of SEs on , . For each , augment the BN with binary child of whose CPT is such that , with .

The transformation is used for the following result.

###### Proposition 4.

Consider the same inputs as in Tr. 2. Then:

 ~P′Xn(x0)=P(x0|d(1)Xn,…,d(m)Xn), (14)

where the probability on the left-hand side is obtained by the direct revision induced by , while the probability on the right-hand side of Eq. (14) has been computed in the BN returned by Tr. 2.

The proof follows from the conditional independence of the auxiliary nodes given . Also, note how our proposal simultaneously performs pooling and absorption of overlapping SEs.

Suppose sources provide generalized CSEs about , say . Let denote the CS induced by LogOp as in Eq. (13), for each , [1]. We generalize Tr. 2 as follows:

###### Transformation 3.

Consider a BN over and the collection of CSEs . For each , augment the BN with binary child of , whose CCPT is such that and .

This transformation returns a CN. A result analogous to Pr. 4 can be derived.

###### Theorem 5.

Consider the same inputs as in Tr. 3. Then:

 ~P––′Xn(x0)=P––(x0|d(1)Xn,…,d(m)Xn), (15)

where the lower probability on the left-hand side has been computed by absorption of the single CSE and the probability on the right-hand side has been computed in the CN returned by Tr. 3. The same relation also holds for the corresponding upper probabilities.

## 7 Conclusions

Credal, or set-valued, modeling of uncertain evidence has been proposed within the framework of Bayesian networks. Such procedure generalizes standard updating. More importantly, our proposal allows to reduce the task of absorption of uncertain evidence to standard updating in credal networks. Complexity results, specific inference schemes, and generalized pooling procedures have been also derived.

As a future work we intend to evaluate the proposed technique with knowledge-based decision-support systems based on Bayesian network to model unreliable observational processes. Moreover the proposed procedure should be extended to the framework of credal networks, thus reconciling the orthogonal viewpoints considered in this paper and in [13], and tackling the case of non-simultaneous updating.

## Appendix A Proofs

###### Proof of Th. 1.

The proof follows from the analogous result with BNs. For any BN consistent with the CN returned by Tr. 1, we have:

 P(x0|dXn) =P(x0,dXn)P(dXn) =∑xnP(x0|xn)P(dXn|xn)P(xn)∑xnP(dXn|xn)P(xn).

As reaches its minimum at , for every , the minimization of the last term coincides with that of Eq. (2) and gives , the other elements being constant. Analogous reasoning yields .

For the proof of Th. 2, we need to introduce the following transformation and lemma.

###### Transformation 4.

Consider a CSE in a BN. Let denote the elements of the CS .888Remember that in our definition of CS we remove the inner points of the convex hull. In the BN, compute the marginal PMF with standard algorithms. Augment the BN with a binary node , such that . Quantify the local model for as an ECPT , specified as a set of CPTs . is defined as:

 Pi(dXn|xn)∝P′i(xn)P(xn), (16)

for each . The same prescriptions provided after Pr. 1 for the case of zero-probability events should be followed here.

###### Lemma 1.

Given a CSE in a BN, consider the CN returned by Tr. 4. Then:

 P––(x0|dXn)=P––′Xn(x0), (17)

and analogously for the upper bound.

###### Proof.

is the only credal node in the CN. Thus:

 P––(x0|dXn)=minP(dXn|Xn)∈K(dXn|Xn)P(x0,dXn)P(dXn). (18)

Let us rewrite Eq. (18) by: (i) explicitly enumerating the CPTs in the ECPT , (ii) making explicit the marginalization of , (iii) exploiting the fact that, by the Markov condition, we have conditional independence between and given . The result is:

 mini=1,…,nv∑xnP(x0|xn)⋅Pi(d|xn)⋅P(xn)∑xnPi(d|xn)⋅P(xn). (19)

Thus, because of Eq. (16):

 mini=1,…,v∑xnP(x0|xn)⋅P′i(xn)∑xnP′i(xn). (20)

As the denominator in Eq. (20) is one we obtain Eq. (7). This proves the lemma.

We can now prove the second theorem.

###### Proof of Th. 2.

Let us first prove the second part of the theorem. As a consequence of Pr. 1, each VE consistent with the CVE can be converted in a SE defined as in Eq. (5). The CS implementing the CSE equivalent to the CVE is therefore:

 K′(Xn):={P′(Xn)∣∣ ∣∣P′(xn)=P(xn)λxn∑xnP(xn)λxnλ––xn≤λxn≤¯¯¯λxn∀xn}. (21)

The computation of is therefore a linearly constrained linear fractional task. If , we can rewrite the objective function as:

 P′(xn)=⎡⎣1+∑x′n≠xnλx′nP(x′n)λxnP(xn)⎤⎦−1. (22)

As is a monotone decreasing function of , minimizing the objective function is equivalent to maximize:

 ∑x′n≠xnλx′nP(x′n)λxnP(xn), (23)

and vice versa for the maximization. As each can vary in its interval independently of the others, the maximum of the function in Eq. (23) is obtained by maximizing the numerator and minimizing the denominator, i.e., for and . This proves Eq. (10), which remains valid also for .

To prove the first part of the theorem, because of Lm. 1, we only need to prove that the CN returned by Tr. 4 and the CN returned by Tr. 1 for the CVE specified in Eq. (9) provides the same . This lower posterior probability in the second CN rewrites as:

 P––(x0|dXn)=minλ––xn≤λxn≤¯¯¯λxn∑xnP(x0|xn)λxnP(xn)∑xnλxnP(xn). (24)

Again, this is a linearly constrained linear fractional task, which can be reduced to a linear task by [7]. In the linear task, the minimum is achieved when the corresponding to the maximum coefficient of the numerator of the objective function takes the minimum value . But as , we can equivalently obtain this value with the ECPT in the first CN. This proves the first part of the theorem.

###### Proof of Th. 3.

The result follows from the analogous for PK. Thus, let us first assume composed of a single PMF . This means that the CSE degenerates into a standard SE. Let denote the corresponding VE and consider the augmented BN obtained by adding the auxiliary binary child . Also, let be any configuration of the joint variable . By the Markov condition:

 P′(x|xn) =P(x|xn,dXn) =P(x|xn)P(xn)P(dXn|xn)P(xn)P(dXn|xn) =P(x|xn).

For a CSE including more than a PMF, we just repeat the same above considerations separately for each and obtain the proof of the statement. Also, by Th. (2) it holds , for all configurations consistent with , for all .

###### Proof of Th. 4.

To prove the theorem we show that the non-binary polytree-shaped CN used by [24, Th. 1] to prove the NP-hardness of non-binary credal polytrees can be used to model UCU in a non-binary polytree-shaped BN. To do that for an arbitrary , consider the BN over with the topology in Fig. 1, Nodes are associated to binary variables, the others to ternary variables. A uniform marginal PMF is specified for , while the CPTs for the other ternary variables are as indicated in Table 2 of the proof we refer to (the numerical values being irrelevant for the present proof). For the binary variables we also specify a uniform prior.

We specify indeed a vacuous CSE for each binary variable. These CSEs can be asborbed by replacing the uniform PMFs with vacuous CSs. The resulting model is exactly the CN used to reduce CN updating to the PARTITION problem [19] and hence proves the thesis.

###### Proof of Th. 5.

For any BN consistent with the CN resulting from Tr. 3 it holds , by Prop. 4. By definition, see Eq. (13), we have:

 minPj∈Kj,Kj∈K′cLogOpαK′(xn)=km∏j=1P––′j(xn)αj, (25)

with being the normalization constant and , for every and for all .

It follows:

 minP(xn)∈cLogOpαK′(xn)~P(x0) =k∑xnP(x0|xn)n∏j=1P––′j(xn) =P––(x0|d(1)Xn,…,d(m)Xn),

where the second term comes by Eq. (2) and Eq. (25). This gives the proof of the theorem.

## References

• [1] Martin Adamčík, The information geometry of Bregman divergences and some applications in multi-expert reasoning, Entropy 16 (2014), no. 12, 6338–6381.
• [2] A. Antonucci, C.P. de Campos, M. Zaffalon, and D. Huber, Approximate credal network updating by linear programming with applications to decision making, International Journal of Approximate Reasoning 58 (2014), 25–38.
• [3] A. Antonucci and M. Zaffalon, Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks, International Journal of Approximate Reasoning 49 (2008), no. 2, 345–361.
• [4] Michael Bacharach, Group decisions in the face of differences of opinion, Management Science 22 (1975), no. 2, 182–191.
• [5] Ali Ben Mrad, Véronique Delcroix, Sylvain Piechowiak, Philip Leicester, and Mohamed Abid, An explication of uncertain evidence in Bayesian networks: likelihood evidence and probabilistic evidence, Applied Intelligence 43 (2015), no. 4, 802–824.
• [6] Jean-Marc Bernard, An introduction to the imprecise Dirichlet model for multinomial data, International Journal of Approximate Reasoning 39 (2005), no. 2-3, 123–150.
• [7] Stephen Boyd and Lieven Vandenberghe, Convex optimization, Cambridge university press, 2004.
• [8] L. Campos, J. Huete, and S. Moral, Probability intervals: a tool for uncertain reasoning, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 2 (1994), no. 2, 167–196.
• [9] Hei Chan and Adnan Darwiche, On the revision of probabilistic beliefs using uncertain evidence, Artificial Intelligence 163 (2005), no. 1, 67–90.
• [10] J. Cleland, Orthopaedic clinical examination: An evidence-based approach for physical therapists, Saunders, 2005.
• [11] G. F. Cooper, The computational complexity of probabilistic inference using Bayesian belief networks, Artificial Intelligence 42 (1990), 393–405.
• [12] F. G. Cozman, Credal networks, Artificial Intelligence 120 (2000), 199–233.
• [13] J.C.F. da Rocha, A.M. Guimaraes, and C.P. de Campos, Dealing with soft evidence in credal networks, Proceedings of Conferencia Latino-Americana de Informatica, 2008.
• [14] C. P. de Campos and F. G. Cozman, The inferential complexity of Bayesian and credal networks, Proceedings of IJCAI ’05 (Edinburgh), 2005, pp. 1313–1318.
• [15] G. De Cooman and M. Zaffalon, Updating beliefs with incomplete observations, Artificial Intelligence 159 (2004), no. 1-2, 75–125.
• [16] Persi Diaconis and Sandy L. Zabell, Updating subjective probability, Journal of the American Statistical Association 77 (1982), no. 380, 822–830.
• [17] Didier Dubois, Three scenarios for the revision of epistemic states, Journal of Logic and Computation 18 (2008), no. 5, 721–738.
• [18] Enrico Fagiuoli and Marco Zaffalon, 2U: an exact interval propagation algorithm for polytrees with binary variables, Artificial Intelligence 106 (1998), 77–107.
• [19] Michael R. Garey and David S. Johnson, Computers and intractability: a guide to the theory of NP-completeness, W. H. Freeman & Co., 1979.
• [20] Adam J. Grove and J. Y. Halpern, Probability update: conditioning vs. cross-entropy, Proceedings of the Thirteenth conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 1997, pp. 208–214.
• [21] Richard C Jeffrey, Ethics and the logic of decision, The Journal of Philosophy 62 (1965), no. 19, 528–539.
• [22] D. Koller and N. Friedman, Probabilistic graphical models: principles and techniques, MIT Press, 2009.
• [23] Jianbing Ma, Weiru Liu, Didier Dubois, and Henri Prade, Bridging Jeffrey’s rule, AGM revision and Dempster conditioning in the theory of evidence, International Journal on Artificial Intelligence Tools 20 (2011), no. 04, 691–720.
• [24] D.D. Mauá, C.P. de Campos, A. Benavoli, and A. Antonucci, Probabilistic inference in credal networks: new complexity results, Journal of Artificial Intelligence Research 50 (2014), 603–637.
• [25] J. Pearl, Probabilistic reasoning in intelligent systems: networks of plausible inference, Morgan Kaufmann, San Mateo, California, 1988.
• [26] Yun Peng, Shenyong Zhang, and Rong Pan, Bayesian network reasoning with uncertain evidences, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 18 (2010), no. 05, 539–564.
• [27] Irina Rish, An empirical study of the naive Bayes classifier, IJCAI 2001 workshop on empirical methods in Artificial Intelligence, vol. 3, IBM New York, 2001, pp. 41–46.
• [28] Christophe Simon, Philippe Weber, and Eric Levrat, Bayesian networks and evidence theory to model complex systems reliability, Journal of Computers 2 (2007), no. 1, 33–43.
• [29] Rush T Stewart and Ignacio Ojea Quintana, Probabilistic opinion pooling with imprecise probabilities, Journal of Philosophical Logic (2017), 1–29.
• [30] Marco Valtorta, Young-Gyun Kim, and Jiří Vomlel, Soft evidential update for probabilistic multiagent systems, International Journal of Approximate Reasoning 29 (2002), no. 1, 71–106.
• [31] Carl G. Wagner, Probability kinematics and commutativity, Philosophy of Science 69 (2002), no. 2, 266–278.
• [32] Chunlai Zhou, Mingyue Wang, and Biao Qin, Belief-kinematics Jeffrey’s rules in the theory of evidence, Proceedings of UAI 2014, 2014, pp. 917–926.
You are adding the first comment!
How to quickly get a good reply:
• Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
• Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
• Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
The feedback must be of minimum 40 characters and the title a minimum of 5 characters