A Signal decay and data processing inequality for multi-nary channels

# Testing Information Causality for General Quantum Communication Protocols

## Abstract

Information causality was proposed as a physical principle to put upper bound on the accessible information gain in a physical bi-partite communication scheme. Intuitively, the information gain cannot be larger than the amount of classical communication to avoid violation of causality. Moreover, it was shown that this bound is consistent with the Tsirelson bound for the binary quantum systems. In this paper, we test the information causality for the more general (non-binary) quantum communication schemes. In order to apply the semi-definite programming method to find the maximal information gain, we only consider the schemes in which the information gain is monotonically related to the Bell-type functions, i.e., the generalization of CHSH functions for Bell inequalities in a binary schemes. We determine these Bell-type functions by using the signal decay theorem. Our results support the proposal of information causality. We also find the maximal information gain by numerical brute-force method for the most general 2-level and 2-setting quantum communication schemes. Our results show that boundary for the information causality bound does not agree with the one for the Tsirelson bound.

3

## I Introduction

The advantage of quantum information has been well exploited in improving the efficiency and reliability for the computation and communication in the past decades. However, even with the help of the seemingly non-local quantum correlation resources, the trivial communication complexity still cannot be reached. The communication complexity could be understood as the bound on the accessible information gain between sender and receiver. Recently, this bound on the information gain is formulated as a physical principle, called the information causality. It states that the information gain in a physical bi-partite communication scheme cannot exceed the amount of classical communication. Intuitively, this is a reasonable and physical constraint. Otherwise, one can predict what your distant partite tries to hide from you and do something to violate causality. For some particular communication schemes with physical resources shared between sender and receiver, it was shown (4); (5) that the bound from the information causality is equivalent to the Tsirelson bound (14) for the binary quantum systems.

By treating information causality as a physical principle, one can disqualify some of the no-signaling theories (6) from being the physical theories if they yield the results violating the information causality. In this way, it may help to single out quantum mechanics as a physical theory by testing the information causality for all possible quantum communication schemes. For example, some efforts along this line was done in (8).

However, most of the tests on the information causality were performed only for the binary communication schemes. It is then interesting to test the information causality for the more general communication schemes. In this paper we will perform the testes for the d-level4 quantum systems, with the more general communication protocols and the more general physical resources shared between sender and receiver. Our results agree with the bound set by the information causality. In the rest of Introduction, we will briefly review the concept of information causality to motivate this work and also outline the strategy of our approach.

Information causality can be presented through the following task of random access code (RAC): Alice has a database of elements, denoted by the vector . Each element is a d-level digit (dit) and is only known to Alice. A second distant party, Bob is given a random variable . The value of is used to instruct Bob in guessing the dit optimally after receiving a dit sent by Alice. In this context, the information causality can be formulated as follows:

 I=k−1∑i=0I(ai;β|b=i)≤log2d. (1)

where is Shannon’s mutual information between and Bob’s guessing dit under the condition . Then, is the information gain of the communication scheme which is bounded by the amount of the classical communication encoded in .

The above information gain is determined by three parts of the communication scheme: (1) the exact RAC protocol, (2) the communication channel and (3) the input marginal probabilities denoted by . This is shown in Fig 1. The purpose of RAC encoding is for Alice to encode her data into and Bob to do his into . The details will be given in section II.

The second part in our communication scheme is a given channel specified by the pre-shared correlation between Alice and Bob, the so-called no-signaling box (NS-box). The aforementioned encoded data and are the input of the NS-box which then yields the corresponding outputs and , respectively. Bob will then combine with the classical information sent from Alice to guess . Most importantly, the NS-box is characterized by the conditional joint probabilities , and should satisfy the following no-signaling condition (6)

 ∑B→yPr(A→x,B→y|→x,→y)=Pr(A→x|→x)and∑A→xPr(A→x,B→y|→x,→y)=Pr(B→y|→y),∀→x,→y. (2)

This implies that superluminal signaling is impossible.

Now comes the third part in our communication scheme: the input marginal probabilities. They are usually assumed to be uniform and not treated as variables. However, when evaluating information gain in (1), we need the conditional probabilities , which are related to both the joint probabilities of the NS-box and the input marginal probabilities . In this work, we will consider the more general communication schemes with variable and non-uniform and evaluate the corresponding information gain.

Naively, one would like to find the information gain of our communication schemes by maximizing the information gain over and . The joint probabilities of the NS-box should be realized by the quantum correlations. However, we will show that this maximization problem is not a convex problem so that it cannot be solved by numerical recipes.

To by-pass this no-go situation, we choose two ways to proceed. The first way is to consider an alternative convex optimization problem, whose object function and the information gain are monotonically related under some special assumptions. It turns out that the alternative convex optimization problem is to find the maximal quantum violation of the Bell-type inequality. This can be thought as finding the generalized Tsirelson bound. We will call the corresponding inequality for the generalized Tsirelson bound 5 the Tsirelson-type inequality, or simply the Tsirelson inequality. Correspondingly, the object function is the LHS of the Bell-type inequality, which we will call the Bell-type function, or simply Bell function.

For the binary 2-setting communication schemes, the Bell-type function is the famous CHSH function. However, for the general schemes one should try to find the appropriate Bell-type functions. In this paper, we generalize the construction method developed in (5) to obtain such Bell-type functions. This method is based on the signal decay theorem proposed in (11); (12). We further show that these Bell-type functions are monotonically related to for the communication schemes with unbiased (i.e., symmetric and isotropic) and i.i.d. inputs with uniform . Therefore, for such schemes we can optimize the information gain by applying the semi-definite programing (SDP) method (19); (20) to obtain the maximum of the Bell-type function for the quantum communication schemes, i.e., the Tsirelson bound.

On the other hand, if we would like to consider the more general communication schemes rather than the aforementioned ones so that the above monotonic relation between and the object function fails, then we will use the second way. This is just to maximize the information gain over and by brutal force numerically without relying on the convex optimization. As limited by the power of our computation facilities, we will only consider the binary 2-setting communication schemes. Our results show that the bound required by the information causality is not saturated by the scheme saturating the Tsirelson bound. Instead, it is saturated by the case saturating the CHSH inequality.

The paper is organized as follows. In the next section we will define our communication schemes in details and then derive the Bell-type functions for the schemes with unbiased and i.i.d. inputs with uniform . In section III, we will show that maximizing the information gain over and is not a convex optimization problem. We also prove that the Bell-type functions and the information gain are monotonically related under some assumptions. In section IV, we briefly review the semidefinite programming (SDP) proposed in (19); (20), and then apply it to solve the convex optimization problem and find out the generalized Tsirelson bound. We use the result to evaluate the corresponding information gain and compare with the bound required by the information causality. In V, we will use the numerical brute-force method to maximize for general binary 2-setting schemes. Finally, we conclude our paper in section VI with some discussions. Besides, several technical detailed results are given in the Appendices.

## Ii The generalized Bell-type functions from the signal decay theorem

In the Introduction, we have briefly described our communication scheme. Here we describe the details of the encoding/decoding in the RAC protocol: Alice encodes her data as with , and Bob does his input as with for and for . The dit-string and are the inputs of the NS-box. The corresponding outputs of the NS-box are and , respectively. More specifically, the dit sent by Alice is , and the pre-shared correlation is defined by the conditional probabilities between the inputs and outputs of the NS-box. Accordingly, Bob’s optimal guessing dit can be chosen as . This is because as long as holds. In this case, Bob guesses perfectly. Take and as an example for illustration: Bob’s optimal guess bit is

 β=→x⋅→y+a0=(a1−a0,a2−a0)⋅(y0,y1)+a0. (3)

If Bob’s input , ; if , ; and if , . Bob can guess perfectly.

Using the above RAC protocol, Alice and Bob have and measurement settings, respectively. Each of the measurement settings will give kinds of outputs. However, the noise of the NS-box affects the successful probability so that Bob can not always guess correctly. If the NS-box is a quantum mechanical one, then the conditional probabilities should be constrained by the Tsirelson-type inequalities, so are the joint probabilities . Then the question is how? For and , the quantum constraint comes from the well-known Tsirelson inequality. That is, the maximal quantum violation of the CHSH inequality is , i.e., . Note that, each term of CHSH function can be expressed in terms of joint probabilities as . Therefore, this is the constraint for to be consistent with quantum mechanics.

However, there is no known Tsirelson-type inequalities for the cases with . Despite that, in (5), we find a systematic way to construct and Tsirelson-type inequalities by the signal decay theorem (11); (12). We will generalize this method to case to yield suitable Bell-type functions. To proceed, we first recapitulate the derivation for cases.

Signal decay theory quantifies the loss of mutual information when processing the data through a noisy channel. Consider a cascade of two communication channels: , then intuitively we have

 I(X;Z)≤I(X;Y). (4)

Moreover, if the second channel is a binary symmetric one, i.e.,

 Pr(Z|Y)=(12(1+ξ)12(1−ξ)12(1−ξ)12(1+ξ)),

then the signal decay theorem says

 I(X;Z)I(X;Y)≤ξ2. (5)

This theorem has been proven to yield a tight bound in (11); (12). Note that the equality is held only when and are almost indistinguishable. For more detail, please see appendix A.

In (5), we set , and . By construction, the bit is encoded as such that . Using the tight bound of (5), we can get

 I(ai;β|b=i)≤ξ2i. (6)

For our RAC protocol, the index of the is the vector . It is then easy to see that is related to both the input marginal probabilities and the joint probabilities of the NS-box by

 1+ξ→y2=∑{→x}Pr(→x)Pr(B→y−A→x=→x⋅→y|→x,→y). (7)

Assuming that Alice’s database is i.i.d., we can then sum over all the mutual information between and to arrive

 ∑iI(ai;β|b=i)≤∑iξ2i. (8)

Though the object on the RHS is quadratic, we can linearize it by the Cauchy-Schwarz inequality, i.e., . For case with uniform input marginal probabilities , it is easy to show that (or ) is nothing but the conventional Tsirelson inequality. Moreover, in (5) we use the SDP algorithm in (18) to generalize to and cases and show that the corresponding Tsirelson-type inequality is

 ∑iξi≤√k. (9)

This is equivalent to say . From the signal decay theorem (6) this implies that the maximal information gain in our RAC protocol with the pre-shared quantum resource is consistent with the information causality (1).

We now generalize the above construction to cases. First, we start with case by considering a cascade of two channels with the second one a 3-input, 3-output symmetric channel. Again, we want to find the upper bound of . In the Appendix A we show that the ratio reaches an upper bound whenever three conditional probabilities with are almost indistinguishable. Moreover, it can be also shown that the upper bound of the ratio is again given by (5) for the symmetric channel between and specified by

 Pr(Z|Y)=⎛⎜ ⎜ ⎜ ⎜⎝2ξ+131−ξ31−ξ31−ξ32ξ+131−ξ31−ξ31−ξ32ξ+13⎞⎟ ⎟ ⎟ ⎟⎠. (10)

One can generalize the above to the higher cases for the symmetric channel between and specified as follows: and with . Again we will arrive (5). Based on the signal decay theorem with , and and assuming that Alice’s input probabilities are i.i.d., we can sum over all the mutual information between each and and obtain

 k−1∑i=0I(β;ai|b=i)≤k−1∑i=0ξ2ilog2(d). (11)

In our RAC protocol, the noise parameter (or ) can be expressed as

 ξ→y=d∑→xPr(→x)Pr(B→y−A→x=→x⋅→y|→x,→y)−1d−1. (12)

As for the case, we assume the upper bound of (11) is capped by the information causality to yield a quadratic constraint on the noise parameters. Again, using the Cauchy-Schwarz inequality to linearize the quadratic constraint, we find . Especially, if the input marginal probabilities are uniform, then this inequality yields a constraint on . Using (12), the LHS of this inequality can be thought as a Bell-type function, and our task is to check if the RHS matches with the Tsirelson bound or not.

Then, it is ready to ask the question: If the joint probabilities of a NS-box achieve the Tsirelson bound, does the same NS-box used in our RAC protocol also saturate the information causality bound? Next, we are going to address this question.

## Iii Convexity and information gain

### 1 Feasibility for maximizing information gain by convex optimization

In order to test the information causality for more general communication schemes, we have to maximize the information gain over the conditional probabilities determined by the joint probabilities and . One way to achieve this task is to formulate the problem as a convex optimization programming, so that we may exploit some numerical recipes such as (21) to carry out the task.

Minimizing a function with the equality or inequality constraints is called convex optimization. The object function could be linear or non-linear. For example, SDP is a kind of convex optimization with a linear object function. Regardless of linear or non-linear object functions, the minimization (maximization) problem requires them to be convex (concave). Thus, if we define the information gain as the object function for maximization in the context of information causality, we have to check if it is concave.

A concave function () should satisfy the following condition:

 f(λx1+(1−λ)x2)≥λf(x1)+(1−λ)f(x2), (13)

where and are -dimensional real vectors, and .

Mutual information between input and output can be written as

 I(X;Z)=H(Z)−H(Z|X)=H(Z)−∑iPr(X=i)H(Z|X=i), (14)

where is the entropy function. We will study the convexity of by varying over the marginal probabilities and the channel probabilities .

The following theorem is mentioned in (22). If we fix the channel probabilities in (14), then is a concave function with respect to . This is the usual way in obtaining the channel capacity, i.e., maximizing information gain over the input marginal probabilities for a fixed channel.

However, in the context of information causality, the conditional probabilities (or ) are related to both the joint probabilities of the NS-box and the input marginal probabilities . This means that the above twos will be correlated if we fix . This cannot fit to our setup in which we aim to maximize the information gain by varying over the joint probabilities of NS-box and the input marginal probabilities . For example, in and case, is given by

 Pr(β|ai,b=i)=(αi1−αi1−λiλi).

where

 α0:=Pr(β=0|a0=0,b=0) =1∑ℓ=0Pr(By−Ax=0|x=ℓ,y=0)Pr(a1=ℓ), (15) λ0:=Pr(β=1|a0=1,b=0) =1∑ℓ=0Pr(By−Ax=0|x=ℓ,y=0)Pr(a1=1−ℓ), (16) α1:=Pr(β=0|a1=0,b=1) =1∑ℓ=0Pr(By−Ax=ℓ|x=ℓ,y=1)Pr(a0=ℓ), (17) λ1:=Pr(β=1|a1=1,b=1) =1∑ℓ=0Pr(By−Ax=ℓ|x=ℓ,y=1)Pr(a0=1−ℓ). (18)

From the above, we see that cannot be fixed by varying over and independently. Similarly, for higher and protocols, we will also have the constraints between the above three probabilities. Thus, maximizing the information gain for the information causality is different from the usual way of finding the channel capacity.

To achieve the goal of maximizing the information gain over the input marginal probabilities and the joint probabilities which can be realized by quantum mechanics, we should check if it is a convex (or concave) optimization problem or not. If yes, then we can adopt the numerical recipe as (21) to carry out the task. Otherwise, we can either impose more constraints for our problem or just do it by brutal force. It is known that (23) one can check if maximizing function over ’s is a concave problem or not by examining its Hessian matrix

 H(f)=⎛⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜⎝∂2f∂y21∂2f∂y1y2⋯∂2f∂y1yn∂2f∂y2y1∂2f∂y22⋯∂2f∂y2yn⋮⋮⋱⋮∂2f∂yny1∂2f∂yny2⋯∂2f∂y2n⎞⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟⎠. (19)

For the maximization to be a concave problem, the Hessian matrix should be negative semidefinite. That is, all the odd order principal minors of should be negative and all the even order ones should be positive. Note that each first-order principal minor of is just the second derivative of , i.e. . So, the problem cannot be concave if for some .

With the above criterion, we can now show that the problem of maximizing over and cannot be a concave problem. To do this, we rewrite the information gain defined in (1) as following:

 I=k−1∑i=0d−1∑n=0d−1∑j=0Pr(β=n,ai=j|b=i)log2Pr(β=n,ai=j|b=i)Pr(β=n|b=i)Pr(ai=j). (20)

Furthermore, one can express the above in terms of and by the following relations

 Pr(β=n,ai=j|b=i) =∑{ak≠i}Pr(B→y−A→x=n−a0|→x,→y)Pr(ai=j)Πk≠iPr(ak), (21) Pr(β=n|b=i) =d−1∑j=0Pr(β=n,ai=j|b=i), (22)

where and in the above are given by the RAC encoding, i.e., with and with for and for .

Moreover, both and are subjected to the normalization conditions of total probability. Thus we need to solve these conditions such that the information gain is expressed as the function of independent probabilities. After that, we can evaluate the corresponding Hessian matrix to examine if the maximization of over these probabilities is a concave problem or not.

For illustration, we first consider the and case. By using the relations (21) and the normalization conditions of total probability to implement the chain-rule while taking derivative, we arrive

 ln2⋅∂2I∂(Pr(By−Ax=0|x=0,y=0))2= −(1Pr(β=0|b=0)+1Pr(β=1|b=0))(Pr(a0=0)Pr(a1=0)−Pr(a0=1)Pr(a1=1))2 +(Pr(a0=0)Pr(a1=0))2(1Pr(β=0,a0=0|b=0)+1Pr(β=1,a0=0|b=0)) +(Pr(a0=1)Pr(a1=1))2(1Pr(β=0,a0=1|b=0)+1Pr(β=1,a0=1|b=0)). (23)

Obviously, (1) cannot always be negative. This can be seen easily if we set so that the first term on the RHS of (1) is zero. Then, the remaining terms are non-negative definiteness. This then indicates that maximizing over the joint probabilities is not a concave problem.

The check for the higher and cases can be done similarly, and the details can be found in the Appendix B. Again, we can set all the to be uniform so that we have

 d2kln2⋅∂2I∂(Pr(B→y−A→x=0|→x=→0,→y=→0))2= d−1∑n=0(1Pr(a0=n,β=n|b=0)+1Pr(a0=n,β=n+1−d|b=0))>0. (24)

### 2 Convex optimization for the unbiased conditional probabilities with i.i.d. and uniform input marginal probabilities

Recall that we would like to check if the boundaries of the information causality and the generalized Tsirelson bound agree or not. To achieve this, we may maximize the information gain with the joint probabilities realized by quantum mechanics. Or, we may find the generalized Tsirelson bound and then evaluate the corresponding information gain which can be compared with the bound of information causality. These two tasks are not equivalent but complementary. However, unlike the first task, the second task will be concave problem as known in (18); (20). The only question in this case is if the corresponding information gain is monotonically related to the Bell-type functions or not. If yes, then finding the generalized Tsirelson bound is equivalent to maximizing the information gain in our communication schemes. The answer is partially yes as we will show this monotonic relation holds only for the unbiased conditional probabilities with i.i.d. and uniform input marginal probabilities .

The unbiased conditional probabilities are symmetric and isotropic. This is defined as follows. One can construct a matrix with the matrix elements with . If all the rows of matrix are permutation for each other and all columns are also permutation for each other, the conditional probabilities are symmetric. Moreover, if the symmetric conditional probabilities for different are the same, the conditional probabilities are isotropic.

Assuming Alice’s input is i.i.d., we have Shannon entropy . As are unbiased, they are symmetric so that for , and for . Thus, the information gain becomes

 I=klog2d+k−1∑i=0[(d−1)ξi+1dlog2((d−1)ξi+1d)+(1−(d−1)ξid)log2(1−ξid)]. (25)

Moreover, are also isotropic, therefore . For such a case the information gain can be further simplified to

 I=k[log2d+(d−1)ξ+1dlog2((d−1)ξ+1d)+(1−(d−1)ξd)log2(1−ξd)]. (26)

The value of is in the interval . As is the noise parameter of the channel with input and output , then for the completely random channel and for the noiseless one, i.e., for and for .

We can show that the information gain is monotonically increasing with the Bell-type functions parameterized by the noise parameter . To see this, we calculate the first and second derivative of with respect to and obtain

 dIdξ=d−1dlog(d−1)ξ+11−ξ, d2Idξ2=d−1d(d−1(d−1)ξ+1+11−ξ).

From the above, we see that is always positive for . Moreover, it is easy to see that is minimal at since . Thus, if the RAC protocol has i.i.d. and uniform input marginal probabilities, the information gain is a monotonically increasing function of for the the unbiased conditional probabilities .

## Iv Finding the quantum violation of the Bell-type inequalities from the hierarchical semi-definite programming

We now will prepare for numerically evaluating the maximum of the Bell-type function

 ∑→yξ→ywith ξ→y given in (???% ) \; and \; Pr(ai)=1d,∀ai,i. (27)

It is monotonic increasing with information gain under some assumptions. In order to ensure that the maximum of (27) can be obtained by quantum resource, we have to use the same method as in (19); (20). In (19); (20), they checked if a given set of probabilities can be reproduced from quantum mechanics or not. This task can be formulated as solving a hierarchy of semidefinite programming (SDP).

### 1 Projection operators with quantum behaviors

We will now briefly review the basic ideas in (19); (20) and then explain how to use it for our program. In (19); (20) they use the projection operators for the following measurement scenario. Two distant partite Alice and Bob share a NS-box. Alice and Bob input and to the NS-box, respectively, and obtain the corresponding outputs and . Here and are used to denote the set of all possible Alice’s and Bob’s measurement outcomes, respectively. We use and to denote the corresponding inputs. These outcomes can be associated with some sets of projection operators and . The joint probabilities of the NS-box can then be determined by the quantum state of the NS-box and the projection operators as following:

 Pr(a,b)=Tr(EaEbρ). (28)

Note that is the abbreviation of defined in the previous sections.

If and are the genuine quantum operators, then they shall satisfy (i) hermiticity: and ; (ii) orthogonality: if and if ; (iii) completeness: and ; and (iv) commutativity: .

In our measurement scenario, the distant partite Alice and Bob perform local measurements so that property (iv) holds. On the other hand, the property (iii) implies no-signaling as it leads to (2) via (28). Furthermore, this property also implies that there is redundancy in specifying Alice’s operators ’s with the same input since one of them can be expressed by the others. Thus, we can eliminate one of the outcomes per setting and denote the corresponding sets of the remaining outcomes for the input by (or for Bob’s outcomes with input ). The collection of such measurement outcomes is denoted as . Similarly, we denote the collection of Bob’s independent outcomes as .

Using the reduced set of projection operators and , we can construct a set of operators . Here is some linear function of products of operators in . The set O is characterized by a matrix given by

 Γij=Tr(O†iOjρ). (29)

By construction, is non-negative definite, i.e.,

 Γ⪰0. (30)

This can be easily proved as follows. For any vector (assuming is a by matrix), one can have

 v†Γv=Σs,tv∗sTr(O†sOtρ)vt=Tr(V†Vρ)≥0. (31)

Recall that our goal is to judge if a given set of joint probabilities such as (28) can be reproduced by quantum mechanics or not. In this prescription, the joint probabilities are then encoded in the matrix satisfying the quantum constraints (28) and (30). However, contains more information than just joint probabilities (28). For examples, the terms appearing in the elements of such as for and can not be expressed in terms of the joint probabilities of the NS-box. This is because these measurements are performed on the same partite (either Alice or Bob) and are not commutative. Therefore, to relate the joint probabilities of the NS-box to the matrix , we need to find the proper combinations of so that the final object can be expressed in terms of only the joint probabilities. Therefore, given the joint probabilities, there shall exist some matrix functions ’s such that the matrix is constrained as follows:

 Σs,t(Fq)s,tΓs,t=gq (32)

where ’s are the linear functions of joint probabilities ’s.

We then call the matrix a certificate if it satisfies (30) and (32) for a given set of joint probabilities of NS-box. The existence of the certificate will then be examined numerically by SDP. If the certificate does not exist, the joint probabilities cannot be reproduced by quantum mechanics.

Examples on how to construct and for some specific NS-box protocols can be found in (19); (20). For illustration, here we will explicitly demonstrate the case not considered in (19); (20), that is the , RAC protocol. We will use the notation which we defined in the previous sections. We start by defining the set of operators with the operator label . The operator is the identity operator , and , .

The associated quantum constraints can be understood as the relations between joint probabilities and (or marginal probabilities and ). That is,

 Tr(ρ)=1,Tr(IEAxρ)=Pr(Ax|x),Tr(IEByρ)=Pr(By|y), Tr(EAxEA′xρ)=δAx,A′xPr(Ax|x),Tr(EByEB′yρ)=δBy,B′yPr(By|y), Tr(EAxEByρ)=Pr(Ax,By|x,y). (33)

Note that these equations also hold when permuting the operators, i.e., .

Moreover, we can make the matrix to be real and symmetric by redefining it as . Thus, in the following we will only display the upper triangular part of . We then use the quantum constraints (1) to construct and by comparing them with (32). We then see that every constraint in (1) yields a matrix function which has only one non-zero element, and also yields a function which is either zero or contains only a single term of a marginal or joint probabilities. These constraints can be further divided into four subsets labeled by as follows:

1. The labels are used to specify the marginal probabilities and . The corresponding matrix functions are given by and , and the and are the corresponding marginal probabilities.

2. The label is used to specify the probabilities associated with the orthogonal operator pairs, . The matrix element , and .

3. The label is used to specify the joint probabilities of the NS-box. The corresponding