Correlation Distance and Bounds for Mutual Information

# Correlation Distance and Bounds for Mutual Information

## Abstract

The correlation distance quantifies the statistical independence of two classical or quantum systems, via the distance from their joint state to the product of the marginal states. Tight lower bounds are given for the mutual information between pairs of two-valued classical variables and quantum qubits, in terms of the corresponding classical and quantum correlation distances. These bounds are stronger than the Pinsker inequality (and refinements thereof) for relative entropy. The classical lower bound may be used to quantify properties of statistical models that violate Bell inequalities. Entangled qubits can have a lower mutual information than can any two-valued classical variables having the same correlation distance. The qubit correlation distance also provides a direct entanglement criterion, related to the spin covariance matrix. Connections of results with classically-correlated quantum states are briefly discussed.

## 1 Introduction

The relative entropy between two probability distributions has many applications in classical and quantum information theory. A number of these applications, including the conditional limit theorem [1], and secure random number generation and communication [2, 3], make use of lower bounds on the relative entropy in terms of a suitable distance between the two distributions. The best known such bound is the so-called Pinsker inequality [4]

 H(P∥Q):=∑jP(j)[logP(j)−logQ(j)]≥12D(P,Q)2loge, (1)

where is the variational or L1 distance between distributions and . Note that choice of logarithm base is left open throughout this paper, corresponding to a choice of units. There are a number of such bounds [4], all of which easily generalise to the case of quantum probabilities [5, 6].

However, in a number of applications of the Pinsker inequality and its quantum analog, a lower bound is in fact only needed for the special case that the relative entropy quantifies the mutual information between two systems. Such applications include, for example, secure random number generation and coding [2, 3] (both classical and quantum), and quantum de Finnetti theorems [7]. Since mutual information is a special case of relative entropy, it follows that it may be possible to find strictly stronger lower bounds for mutual information.

Surprisingly little attention appears to have been paid to this possiblity of better lower bounds (although upper bounds for mutual information have been investigated [8]). The results of preliminary investigations are given here, with explicit tight lower bounds being obtained for pairs of two-valued classical random variables, and for pairs of quantum qubits with maximally-mixed reduced states.

In the context of mutual information, the corresponding variational distance reduces to the distance between the joint state of the systems and the product of their marginal states, referred to here as the ‘correlation distance’. It is shown that both the classical and quantum correlation distances are relevant to quantifying properties of quantum entanglement: the former with respect to the classical resources required to simulate entanglement, and the latter as providing a criterion for qubit entanglement. In the quantum case, it is also shown that the minimum value of the mutual information can only be achieved by entangled qbuits if the correlation distance is more than .

The main results are given in the following section. Lower bounds on classical and quantum mutual information for two-level systems are derived in sections 3 and 5, and an entanglement criterion for qubits in terms of the quantum correlation distance is obtained in Section 4. Connections with classically-correlated quantum states are briefly discussed in section 6, and conclusions presented in section 7.

## 2 Definitions and Main Results

For two classical random variables and , with joint probability distribution and marginal distributions and , the Shannon mutual information and the classical correlation distance are defined respectively by

 I(PAB) := H(PAB∥PAPB)=H(PA)+H(PB)−H(PAB), C(PAB) := ∥PAB−PAPB∥1=∑a,b|PAB(a,b)−PA(a)PB(b)|,

where denotes the Shannon entropy of distribution . The term ‘correlation distance’ is used for , since it inherits all the properties of a distance from the more general variational distance, and clearly vanishes for uncorrelated and .

For two quantum systems and described by density operator and reduced density operators and , the corresponding quantum mutual information and quantum correlation distance are analogously defined by

 I(ρAB) := S(ρA)+S(ρB)−S(ρAB), C(ρAB) := ∥ρAB−ρA⊗ρB∥1=tr|ρAB−ρA⊗ρB|,

where denotes the von Neumann entropy of density operator .

In both the classical and quantum cases, one has the lower bound

 I≥12C2loge (2)

for mutual information, as a direct consequence of the Pinsker inequality (1) for classical relative entropies [4, 5, 6]. However, better bounds for mutual information can be obtained, which are stronger than any general inequality for relative entropy and variational distance.

For example, for two-valued classical random variables and one has the tight lower bound

 I(PAB)≥log2−H(1+C(PAB)2,1−C(PAB)2) (3)

for classical mutual information. This inequality has been previously stated without proof in Ref. [9], where it was used to bound the shared information required to classically simulate entangled quantum systems. It is proved in section 3 below.

In contrast to Pinsker-type inequalities such as Eq. (2), the quantum generalisation of Eq. (3) is not straightforward. In particular, note for a two-qubit system that one cannot simply replace by in Eq. (3), as the right hand side would be undefined for – which can occur if the qubits are entangled. Indeed, as shown in section 4, is a sufficient condition for the entanglement of two qubits, as is the stronger condition

 C(ρAB)>2√(1−tr[ρ2A])(1−tr[ρ2B]). (4)

An explicit expression for the quantum correlation distance for two qubits, in terms of the spin covariance matrix, is also given in section 4.

It is shown in section 5 that the quantum equivalent of Eq. (3), i.e., a tight lower bound for the quantum mutual information shared by two qubits, is

 I(ρAB)≥⎧⎪⎨⎪⎩log2−H(1+C(ρAB)2,1−C(ρAB)2),C(ρAB)≤C0,log4−H(14+C(ρAB)2,14−C(ρAB)6,14−C(ρAB)6,14−C(ρAB)6),C(ρAB)>C0, (5)

when the reduced density operators are maximally mixed, where . For this lower bound can only be achieved by entangled states, and cannot be achieved by any classical distribution having the same correlation distance. It is also shown that, for , the bound is also tight if only one of the reduced states is maximally mixed. Support is given for the conjecture that the bound in Eq. (5) in fact holds for all two-qubit states.

In section 6 the natural role of ‘classically-correlated’ quantum states, in comparing classical and quantum correlations, is briefly discussed. Such states have the general form [10], where is a classical joint probability distribution and and are orthonormal basis sets for the two quantum systems. The lower bound in Eq. (5) can be saturated by a classically-correlated state if and only if .

## 3 Tight Lower Bound for Classical Mutual Information

### 3.1 Derivation of Bound

The tight lower bound in Eq. (3) is derived here. The bound is plotted in Figure 1 below [top curve]. Also plotted for comparison are the Pinsker lower bound in Eq. (2) [bottom curve], and the lower bound following from the best possible generic inequality for relative entropy and variational distance, given in parametric form in Ref. [4] [intermediate curve].

To derive the bound in Eq. (3), it is convenient to label the two possible values of and by . Defining , it follows by summing over each of and that for some number , and hence that . Further, writing and , for suitable , the positivity condition is equivalent to

 |x+y|−1≤r+xy≤1−|x−y|. (6)

Now, Eq. (3) is equivalent to

 f(r):=I(PAB)−log2+H(1+r2,1−r2)≥0. (7)

It is easy to check that this inequality is always saturated for the case of maximally-random marginals, i.e, when . In all other cases, the inequality may be proved by showing that has a unique global minimum value of 0 at .

In particular, note first that (one has in this case, so that the mutual information vanishes). Further, using , one easily calculates that, using logarithm base for convenience,

 f′(r)=14∑a,bablogPAB(a,b)−12∑aalog1+ar2=14logpAB(+,+)pAB(−,−)(1−r)2pAB(+,−)pAB(−,+)(1+r)2.

Hence, if and only if the argument of the logarithm is unity, i.e., if and only if

 [(1+x)(1+y)+r][(1−x)(1−y)+r](1−r)2=[(1+x)(1−y)−r][(1−x)(1+y)−r](1+r)2.

Expanding and simplifying yields two possible solutions: , or . However, in the latter case one has

 |r+xy|=x2+y2+x2y22|xy|=αγ+γ2≥1+γ2≥1,

where and denote the arithmetic mean and geometric mean, respectively, of and (hence ). This is clearly inconsistent with the positivity condition (6) (unless , which trivially saturates Eq. (7) for all as noted above). The only remaining solution to is then , implying has a unique maximum or minimum value at . Finally, it is easily checked that it is a minimum, since

 f′′(0)=116∑a,b1pA(a)pB(b)−1=116PA(+)PA(−)PB(+)PB(−)−1=1(1−x2)(1−y2)−1≥0

(with equality only for the trivially-saturating case ). Thus, as required.

### 3.2 Application: Resources for Simulating Bell Inequality Violation

The hallmark feature of quantum correlations is that they cannot be explained by any underlying statistical model that satisfies three physically very plausible properties: (i) no signaling faster than the speed of light, (ii) free choice of measurement settings, and (iii) independence of local outcomes. Various interpretations of quantum mechanics differ in regard to which of these properties should be given up. It is of interest to consider by how much they must be given up, in terms of the information-theoretic resources required to simulate a given quantum correlation. For example, how many bits of communication, or bits of correlation between the source and the measurement settings, or bits of correlation between the outcomes, are required? The lower bound for classical mutual information in Eq. (3) is relevant to the last of these questions.

In more detail, if denotes the joint probability of outcomes and , for measurements of variables and on respective spacelike-separated systems, and denotes any underlying variables relevant to the correlations, then Bayes theorem implies that

 PAB(a,b)=∑λpAB(λ)PAB(a,b|λ),

where summation is replaced by integration over any continuous values of . The no-signaling property requires that the underlying marginal distribution of , , is independent of whether or was measured on the second system (and vice versa), while the free-choice property requires that is independent of the choice of the measured variables and , i.e., that for any . Finally, the outcome independence property requires that any observed correlation between and arises from ignorance of the underlying variable, i.e., that for all , and . Thus the correlation distance of vanishes identically:

 C(PAB|λ)≡0. (8)

As is well known, the assumption of all three properties implies that two-valued random variables with values must satisfy the Bell inequality [11]

 ⟨AB⟩+⟨AB′⟩+⟨A′B⟩−⟨A′B′⟩≤2, (9)

whereas quantum correlations can violate this inequality by as much as a factor of . It follows that quantum correlations can only be modeled by relaxing one or more of the above properties, as has recently been reviewed in detail in Ref. [9].

For example, assuming that no-signaling and measurement independence hold (as they do in the standard Copenhagen interpretation of quantum mechanics), and defining to be the maximum value of over all , and , it can be shown that Eq. (9) generalises to the tight bound [9]

 ⟨AB⟩+⟨AB′⟩+⟨A′B⟩−⟨A′B′⟩≤42−Cmax. (10)

It follows that to simulate a Bell inequality violation , for some , the observers must share random variables having a correlation distance of at least . Hence, using the classical lower bound Eq. (3) (stated without proof in Ref. [9]), the observers must share a minimum mutual information of

 Imin=log2−H(1+Cmax2,1−Cmax2)≥log2−H(2+3V4+2V,2−V4+2V). (11)

Note this reduces to zero in the limit of no violation of Bell inequality (9), i.e., when , and reaches a maximum of 1 bit of information in the limit of the maximum possible violation, .

## 4 Quantum Correlation Distance and Qubit Entanglement

The positivity condition (6) may be used to show that the classical correlation distance between any pair of two-valued random variables is never greater than unity, i.e., that [9]. In contrast, the quantum correlation distance between a pair of qubits can be greater than unity, with upper bound . More generally, one has

 C(PAB)≤2(n−1)/n,          C(ρAB)≤2(n2−1)/n2 (12)

for pairs of -valued random variables and -level quantum systems, with saturation corresponding to maximal correlation and maximal entanglement respectively. Thus, quantum correlations have a quadratic advantage with respect to correlation distance (this is also the case for mutual information, for which one has and ).

Nonclassical values of the quantum correlation distance are closely related to the quintessential nonclassical feature of quantum mechanics: entanglement. In particular, is a direct signature of qubit entanglement. Indeed, even correlation distances smaller than unity can imply two qubits are entangled, as per the criterion given in Eq. (4) and shown below. An explicit formula for qubit correlation distance in terms of the spin covariance matrix, needed for section 5, is also obtained below.

### 4.1 Entanglement Criterion

Recall that the density operator of two qubits may always be written in the Fano form [12]

 ρAB = 14⎡⎣I⊗I+u.σ⊗I+I⊗v.σ+∑j,k⟨σj⊗σk⟩σj⊗σk⎤⎦ (13) = ρA⊗ρB+14∑j,kTjkσj⊗σk.

Here is the unit operator; denotes the set of Pauli spin observables on each qubit Hilbert space; the components of the 3-vectors and are the spin expectation values and , for and respectively; and denotes the spin covariance matrix with coefficients

 Tjk:=⟨σj⊗σk⟩−⟨σj⊗I⟩⟨I⊗σk⟩.

It immediately follows from Eq. (13) that the quantum correlation distance may be expressed in terms of the spin covariance matrix as

 C(ρAB)=14tr∣∣ ∣∣∑j,kTjkσj⊗σk∣∣ ∣∣. (14)

This expression will be further simplified in subsection 4.2.

Now consider the case where is a separable state, i.e., of the unentangled form

 ρAB=∑λp(λ)τA(λ)⊗ωB(λ),

for some probability distribution and local density operators , . Defining , implies and , and substitution into Eq. (14) then yields

 C(ρAB) = 14∥∥ ∥∥∑λp(λ)[u(λ)−u].σ⊗[v(λ)−v].σ∥∥ ∥∥1 (15) ≤ 14∑λp(λ)∥[u(λ)−u].σ∥1∥[v(λ)−v].σ∥1 = ∑λp(λ)|u(λ)−u||v(λ)−v|≤[∑λp(λ)|u(λ)−u|2]1/2[∑λp(λ)|v(λ)−v|2]1/2 = [∑λp(λ)|u(λ)|2−|u|2]1/2[∑λp(λ)|v(λ)|2−|v|2]1/2≤√(1−u.u)(1−v.v).

Note that second line follows from the properties and of the trace norm; the third line using and the Schwarz inequality; and the last line via .

Equation (15) holds for all separable qubit states. Hence, a nonclassical value of the correlation distance, , immediately implies that the qubits must be entangled. More generally, noting that and , one has , , and the stronger entanglement criterion (4) immediately follows from Eq. (15).

The fact that entanglement is required between two qubits, for to be greater than the maximum possible value of for two-valued classical variables, is a nice distinction between quantum and classical correlation distances. It would be of interest to determine whether this result generalises to -level systems. This would follow from the validity of Eq. (4) for arbitrary quantum systems.

### 4.2 Explicit Expression for C(ρAB)

To explicitly evaluate in Eq. (14), let denote a singular value decomposition of the spin covariance matrix. Thus, and are real orthogonal matrices and , with the singular values corresponding to the square roots of the eigenvalues of . Noting that any orthogonal matrix is either a rotation matrix, or the product of a rotation matrix with the parity matrix , one therefore always has a decomposition of the form where and are now restricted to be rotation matrices. Hence, defining unitary operators and corresponding to rotations and , via and , and using the invariance of the trace norm under unitary transformations, the quantum correlation distance in Eq. (14) can be rewritten as

 C(ρAB)=14tr∣∣ ∣∣±∑jtjUσjU†⊗VσjV†∣∣ ∣∣=14tr∣∣ ∣∣∑jtjσj⊗σj∣∣ ∣∣.

Determining the eigenvalues of the Hermitian operator is a straighforward matrix calcuation using the standard representation of the Pauli sigma matrices. Summing the absolute values of these eigenvalues then yields the explicit expression

 C(ρAB) = 14[|t1+t2+t3|+|t1+t2−t3|+|t1−t2+t3|+|−t1+t2+t3|] (16) = 12max{t1+t2+t3,2t1}

for the quantum correlation distance, in terms of the singular values of the spin covariance matrix.

For example, for the Werner state , where is the singlet state and [13], one has and hence that . The corresponding correlation distance is therefore , which is greater than the classical maximum of unity for .

Equation (16) also allows the qubit entanglement criterion (4) to be directly compared with strongest known criterion based on the spin covariance matrix [14]:

 t1+t2+t3>2√(1−tr[ρ2A])(1−tr[ρ2B]). (17)

For the above Werner state this criterion is tight, indicating entanglement for . Hence, the main interest in weaker entanglement criteria based on quantum correlation distance lies in their direct connection with nonclassical values of the classical correlation distance.

## 5 Tight Lower Bound for Quantum Mutual Information

Here Eq. (5) is derived for the case . Evidence is provided for the conjecture that Eq. (5) in fact holds for all two-qubit states, including a partial generalisation of Eq. (5) when only one of and is maximally-mixed.

### 5.1 Derivation for Maximally-Mixed ρA and ρB

The tight lower bound for quantum mutual information in Eq. (5), for maximally-mixed reduced states, is plotted in Figure 2 below [top solid curve]. Also plotted for comparison are the Pinsker lower bound in Eq. (2) [bottom solid curve], and classical lower bound in Eq. (3) [dashed curve]. The dotted vertical line indicates the value of in Eq. (5). It is seen that quantum correlations can violate the classical lower bound for correlation distances falling between and 1.

To derive Eq. (5) for , note first that Eq. (13) reduces to . By the same argument given in section 4.2, this can be transformed via local unitary transformations to the state

 ~ρAB=14[I⊗I+∑jrjσj⊗σj], (18)

where , , and are the singular values of the spin covariance matrix . Since the quantum mutual information and quantum correlation distance are invariant under local unitary transformations, one has and . Hence Eq. (5) only needs to be demonstrated for .

The mutual information of is easily evaluated as

 I(~ρAB)=S(~ρA)+S(~ρB)−S(~ρAB)=log4−H(p0,p1,p2,p3), (19)

where , , , are the eigenvalues of . Inverting the relation between the and further yields

 tj=αrj=α[1−2(p0+pj)],    t1+t2+t3=α(1−4p0), (20)

and hence the correlation distance follows from Eq. (16) as

 C(~ρAB)=C:=12max{α(1−4p0),α(1−4p0+1−4p1)}. (21)

Equation (19) implies that a tight lower bound for corresponds to a tight upper bound for . To determine the maximum value of , for a fixed correlation distance , consider first the case . The ordering and positivity conditions on then require , and for (implying ). Further, from Eq. (21), . Hence, if , then , implying the constraint . Noting the concavity of entropy, the maximum possible entropy under this constraint corresponds to equal values , and (which are compatible with the above conditions on the ). Conversely, if then , and hence is fixed, implying by concavity that the maximum possible entropy corresponds to (which again satisfies the required conditions on the ). It follows that the maximum possible entropy is (i) the maximum of the entropies and for , and (ii) for . However, it is straightforward to show that over their overlapping range. Hence the maximum possible entropy is always for the case .

For the case , the conditions on require that and for (implying ), while from Eq. (21) . Carrying out a similar analysis to the above, one finds that the maximum possible entropy is (i) the maximum of the entropies and for , and (ii) for .

Numerical comparison shows that for , and otherwise. Hence, from Eq. (19) one has the tight lower bound

 I(~ρAB)≥{log4−H1(C),C≤C0,log4−H3(C),C>C0. (22)

Since , it follows that Eq. (5) holds for in Eq. (18), and hence for all qubit states with maximally-mixed reduced density operators, as claimed.

The states saturating the lower bound in Eqs. (5) and (22) are easily constructed from the above derivation. In particular, they are given by

 ρ(C):=⎧⎪⎨⎪⎩14[I⊗I+Cσ1⊗σ1],C≤C0,14[I⊗I−(2C/3)∑jσj⊗σj],C>C0, (23)

and any local unitary transformations thereof, where the quantum correlation distance of is by construction.

Note that is unentangled for (it can be written as a mixture of , and , where ). Conversely, is an entangled Werner state for (with singlet state weighting ). Hence, the lower bound in Eqs. (5) and (22) can only be achieved by entangled states for , and cannot be achieved by any two-valued classical random variables.

### 5.2 Conjecture

It is conjectured that Eq. (5) is in fact a tight lower bound for any two-qubit state. This conjecture would follow immediately if it could be shown that

 I(ρAB)≥I(ρ′AB) (24)

for arbitrary , where . This is because is of the form of in Eq. (18), and hence satisfies Eq. (22).

Partial support for Eq. (24), and hence for the conjecture, is given by noting that any and corresponding can be brought to the respective forms

 ρAB=ρA⊗ρB+14∑jrjσj⊗σj,    ρ′AB=14[I⊗I+∑jrjσj⊗σj]

via suitable local unitary transformations, similarly to the argument in section 4.2. Defining the function

 F(r1,r2,r3):=I(ρAB)−I(ρ′AB),

it is straightforward to show that and for , consistent with . However, it remains to be shown that the gradient does not vanish for other physically possible values of the (other than for the trivially saturating case ).

The above conjecture is further supported by the generalisation of Eq. (5) in the following section.

### 5.3 Generalisation to Maximally-Mixed ρA or ρB

It is straighforward to show that the lower bound on quantum mutual information is tight for when just one of the mixed density operators is mixed, i.e., if or is equal to .

First, since is invariant under unitary transformations, the same argument as in section 4.2 implies the state can always be transformed by local unitary transformations to the generalised form

 ~ρAB=14[~ρA⊗~ρB+α∑jtjσj⊗σj]

of Eq. (18), where either or equals and .

Second, let denote the ‘twirling’ operation, corresponding to applying a random unitary transformation of the form [15]. It is easy to check that by definition , and , for any and . Since Werner states are invariant under twirling [13, 15], it follows that . Using these properties, one finds that if one of or is maximally mixed, and hence that

 T(~ρAB)=14[I⊗I+α¯t∑jσj⊗σj]=ρ(−3α¯t/2),

where and the second equality holds for (but not otherwise), with defined as per Eq. (23). Further, from Eq. (16) one has

 C(T(~ρAB))=C=12max{2¯t,3¯t}=3¯t/2.

Recalling that saturates Eq. (22), an analysis similar to section 5.1 shows for that

 I(T(~ρAB))=log4−H3(−αC)≥log4−H3(C),

with equality for .

Third, again using , and the property that the relative entropy is non-increasing under the twirling operation, it follows that

 I(~ρAB)=S(~ρAB∥~ρA⊗~ρB)≥S(T(~ρAB)∥T(~ρA⊗~ρB))=I(T(~ρAB))≥log4−H3(C) (25)

for . Since Werner states are invariant under twirling, this inequality is tight for , being saturated by the choice . Recalling that mutual information and correlation distance are invariant under local unitary operations, the inequality is therefore tight for any for which one of and is maximally mixed, as claimed.

## 6 Classically-Correlated Quantum States

It is well known that a quantum system behaves classically if the state and the observables of interest all commute, i.e., if they can be simultaneously diagonalised in some basis. Hence, a joint state will behave classically if the relevant observables of each system commute with each other and the state. It is therefore natural to define to be classically correlated if and only if it can be diagonalised in a joint basis [10], i.e., if and only if

 ρAB=∑j,kP(j,k)|j⟩⟨j|⊗|k⟩⟨k| (26)

for some distribution and orthonormal basis set . Classical correlation is preserved by tensor products, and by mixtures of commuting states.

While, strictly speaking, a classically-correlated quantum state only behaves classically with respect to observables that are diagonal with respect to , they also have a number of classical correlation properties with respect to general observables [10, 16], briefly noted here.

First, above is separable by construction, and hence is unentangled. Second, since it is diagonal in the basis , the mutual information and correlation distance are easily calculated as

 I(ρAB)=I(P),      C(ρAB)=C(P), (27)

and hence can only take classical values.

Third, if and denote any observables for systems and respectively, then their joint statistics are given by

 PMN(m,n)=∑j,kp(m|j)p(n|k)P(j,k)=∑j,kSm,n;j,kP(j,k),

where is a stochastic matrix with respect to its first and second pairs of indices. Similarly, one finds

 PM(m)PN(n)=∑j,kSm,n;j,kP(j)P(k)

for the product of the marginals. Since the classical relative entropy and variational distance can only decrease under the action of a stochastic matrix, it follows that one has the tight inequalities [10, 16]

 I(PMN)≤I(P)=I(ρAB),     C(PMN)≤C(P)=C(ρAB), (28)

with saturation for and diagonal in the bases and respectively. Maximising the first of these equalities over or immediately implies that classically-correlated states have zero quantum discord.

Finally, for two-qubit systems, Eq. (26) implies that is classically correlated if and only if it is equivalent under local unitary transformations to a state of the form

 ρ′AB=14[(1+xσ1)⊗(1+yσ1)+rσ1⊗σ1],

where and satisfies Eq. (6). Hence, the mutual information is bounded by the classical lower bound in Eq. (3), and in Eq. (23) is classically cor