On Variational Expressions for Quantum Relative Entropies

# On Variational Expressions for Quantum Relative Entropies

## Abstract

Distance measures between quantum states like the trace distance and the fidelity can naturally be defined by optimizing a classical distance measure over all measurement statistics that can be obtained from the respective quantum states. In contrast, Petz showed that the measured relative entropy, defined as a maximization of the Kullback-Leibler divergence over projective measurement statistics, is strictly smaller than Umegaki’s quantum relative entropy whenever the states do not commute. We extend this result in two ways. First, we show that Petz’ conclusion remains true if we allow general positive operator valued measures. Second, we extend the result to Rényi relative entropies and show that for non-commuting states the sandwiched Rényi relative entropy is strictly larger than the measured Rényi relative entropy for , and strictly smaller for . The latter statement provides counterexamples for the data-processing inequality of the sandwiched Rényi relative entropy for . Our main tool is a new variational expression for the measured Rényi relative entropy, which we further exploit to show that certain lower bounds on quantum conditional mutual information are superadditive.
Keywords: Quantum entropy, measured relative entropy, relative entropy of recovery, additivity in quantum information theory, operator Jensen inequality, convex optimization.
Mathematics Subject Classifications (2010): 94A17, 81Q99, 15A45.

## 1Measured Relative Entropy

The relative entropy is the basic concept underlying various information measures like entropy, conditional entropy and mutual information. A thorough understanding of its quantum generalization is thus of preeminent importance in quantum information theory. We start by considering measured relative entropy, which is defined as a maximization of the Kullback-Leibler divergence over all measurement statistics that are attainable from two quantum states.

For a positive measure on a finite set and a probability measure on that is absolutely continuous with respect to , denoted , the relative entropy or Kullback-Leibler divergence [26] is defined as

where we understand whenever . By continuity we define it as if . (We use to denote the natural logarithm.)

To extend this concept to quantum systems, Donald [12] as well as Hiai and Petz [21] studied measured relative entropy. In the following we restrict ourselves to a -dimensional Hilbert space for some . Let us denote the set of positive semidefinite operators acting on this space by and the subset of density operators (with unit trace) by . For a density operator and , we define two variants of measured relative entropy. The general measured relative entropy is defined as

where the optimization is over finite sets and positive operator valued measures (POVMs) on . (More formally, is a map from to positive semidefinite operators and satisfies , whereas is a measure on defined via the relation for any .) At first sight this definition seems cumbersome because we cannot restrict the size of that we optimize over. Therefore, following the works [12], let us also consider the following projectively measured relative entropy, which is defined as

where the maximization is over all sets of mutually orthogonal projectors and we spelled out the Kullback-Leibler divergence for discrete measures. Note that without loss of generality we can assume that these projectors are rank- as any course graining of the measurement outcomes can only reduce the relative entropy due to its data-processing inequality [29]. Furthermore, the quantity is finite and the supremum is achieved whenever , which here denotes that the support of is contained in the support of . (To verify this, recall that the rank- projectors form a compact set and the divergence is lower semi-continuous.)

The first of the following two variational expressions for the (projectively) measured relative entropy is due to Petz [37]. Note that the second objective function is concave in so that the optimization problem has a particularly appealing form.

If then the two expressions in the suprema of are unbounded, as expected. We now assume that . Let us consider the second expression in . We write the supremum over as two suprema over and , where are the eigenvalues of corresponding to the eigenvectors given by rank- projectors . Using the fact that , we find

For such that , we also have , and thus the corresponding term is zero. When , let us first consider . In this case, the supremum of -th term is achieved in the limit . Now in the case (which is the only possible case when ), observe that the expression is concave in , the inner supremum is achieved by the local maxima at . Plugging this into , we find

This is the expression for the measured relative entropy in . The remaining supremum is achieved because the set of rank-1 projectors is compact and the divergence is lower semi-continuous.

It remains to show that the two variational expressions in are equivalent. We have for all and, thus, for all . This yields

Now note that the expression on the left hand side is invariant under the substitution for . Hence, as for and non-zero , we can add the normalization constraint and we have

where we used that when .

Using this variational expression, we are able to answer a question left open by Donald [12] as well as Hiai and Petz [21], namely whether the two definitions of measured relative entropy are equal.

The direction ’’ holds trivially. Moreover, if , we can choose to be a rank- projector such that and and thus .

It remains to show the direction ’’ when holds. Let be a POVM. Recall that the distribution is defined by . Introducing , we can write

Now observe that for any , the spectrum of the operator is included in . As a result, we can apply the operator Jensen inequality for the function , which is operator concave on and get

Now we simply choose

and which allows to further bound by . Comparing this with the variational expression for the measured relative entropy in Lemma ? yields the desired inequality.

Hence, the measured relative entropy, , achieves its maximum for projective rank- measurements and can be evaluated using the concave optimization problem in Lemma ?.

## 2Measured Rényi Relative Entropy

Here we extend the results of the previous section to Rényi divergence. Using the same notation as in the previous section, for we define the Rényi divergence [40] as

if and as if . For we rewrite the sum as

Hence we see that absolute continuity is not necessary to keep finite for . However, the Rényi divergence instead diverges to when and are orthogonal.1 It is well known that the Rényi divergence converges to the Kullback-Leibler divergence when and we thus set . Moreover, in the limit we find the max-divergence .

Let us now define the measured Rényi relative entropy as before, namely

We will later show that this is equivalent to the following projectively measured Rényi relative entropy, which we define here for as

and analogously for with

Note that the supremum in is achieved and is finite whenever . Similarly, the minimum in is non-zero and is finite whenever , i.e. when the two states are not orthogonal.

Next we give variational expressions for similar to the variational characterization of the measured relative entropy in Lemma ?.

The expressions can be seen as a generalization of Alberti’s theorem [1] for the fidelity (which corresponds to ) to general .

We first show the identity . Let us discuss the case in detail. Note that the two expressions given for and are equivalent by the transformation (the reason for the different ways of writing is to see that the expressions are convex in , which will be useful later, in particular in Theorem ?). We first write

Let be such that and are both strictly positive (which is the case when and ). Then a local (and thus global) minimum for is easily found at the point where

If both we can chose arbitrarily. If only the infimum is achieved in the limit , and if only in the limit . In all these cases the infimum of the -th term is zero. Furthermore, it is achieved for a finite, non-zero when and . Plugging this solution into the above expression yields

This infimum is always achieved since the set we optimize over is compact. Comparing this with the definition of yields the first equality.

For the case , when , the proof is analogous to the previous argument. Otherwise, it is simple to see that the supremum is .

Now we show . Using and the weighted arithmetic-geometric mean inequality we have

However, for any feasible in and , is also feasible and choosing shows that cannot exceed . Similarly, by Bernoulli’s inequality,

And the same argument as above yields the equality.

As for the measured relative entropy, the restriction to rank- projective measurements is in fact not restrictive at all.

For we follow the steps of the proof of Theorem ?. Consider any finite set and POVM with induced measures and . We can write

where we can restrict the sum over . We then find that the sum satisfies

where the inequality again follows by the operator Jensen inequality and the operator concavity of the function on . Now we set

Thus, we can bound

Comparing this with the variational expression in Lemma ? yields the desired inequality.

For , we use the same notation as in . We further distinguish the cases and . For , we define

We can then evaluate

where we used the operator convexity of on and the operator Jensen inequality. Moreover,

As a result

Comparing this with the variational expression in Lemma ? yields the desired inequality.

For we choose , so that

and once again conclude using the variational expression in Lemma ?.

## 3Achievability of Relative Entropy

### 3.1Umegaki’s Relative Entropy

Here we compare the measured relative entropy to other notions of quantum relative entropy that have been investigated in the literature and have found operational significance in quantum information theory. Umegaki’s quantum relative entropy [50] has found operational significance as the threshold rate for asymmetric binary quantum hypothesis testing [21]. For and , it is defined as

We recall the following variational expression by Petz [36] (see also [25] for another variational expression):

By the data-processing inequality for the relative entropy [29] and Theorem ? we always have

and moreover Petz [35] showed the inequality is strict if and do not commute (for and . Theorem ? strengthens this to show that the strict inequality persists even when we take a supremum over POVMs. In the following we give an alternative proof of Petz’ result and then extend the argument to Rényi relative entropy in Section 3.2.

Our proof relies on the Golden–Thompson inequality [19]. It states that for two Hermitian matrices and , it holds that

with equality if an only if as shown in [43].

First, it is evident that equality holds if since then there exists a projective measurement that commutes with and and thus does not effect the states. For the following, it is worth writing the variational expressions for the two quantities side by side. Namely, writing , we have

where we optimize over all Hermitian operators . Note that, according to Lemma ?, we can write a for because we are assuming and . The inequality in can now be seen as a direct consequence of the Golden–Thompson inequality.

It remains to show that implies . Let be any maximizer of the variational problem in . Observe now that the equality necessitates

which holds only if and only if by the equality condition in . Now define the function

and since maximizes , we must have for all Hermitian ,

To evaluate the second summand of this Fréchet derivative we used that and commute. Since this holds for all we must in fact have , which means that , as desired.

In some sense this result tells us that some quantum correlations, as measured by the relative entropy, do not survive the measurement process. This fact appears in quantum information theory in various different guises, for example in the form of locking classical correlations in quantum states [11]. (We also point to [38] for the use of measured relative entropy measures in quantum information theory.) Moreover, since Umegaki’s relative entropy is the smallest quantum generalization of the Kullback-Leibler divergence that is both additive and satisfies data-processing (see, e.g., [47]), the same conclusion can be drawn for any sensible quantum relative entropy. (An example being the quantum relative entropy introduced by Belavkin and Staszewski [4].)

### 3.2Sandwiched Rényi Relative Entropy

Next we consider a family of quantum Rényi relative entropies [33] that are commonly called sandwiched Rényi relative entropies and have found operational significance since they determine the strong converse exponent in asymmetric binary quantum hypothesis testing [31]. They are of particular interest here because they are, for , the smallest quantum generalization of the Rényi divergence that is both additive and satisfies data-processing [47]. (Examples for other sensible quantum generalizations are the quantum Rényi relative entropy first studied by Petz [34] and the quantum divergences introduced by Matsumoto [30].)

For and , the sandwiched Rényi relative entropy of order is defined as

where the same considerations about finiteness as for the measured Rényi relative entropy apply. We also consider the limits and of the above expression for which we have [33],

respectively. We recall the following variational expression by Frank and Lieb [17]:

Alternatively, we can also write

where we have used the same arguments as in the proof of the second part of Lemma ?. By the data-processing inequality for the sandwiched Rényi relative entropy [33] we always have

In the following we give an alternative proof of this fact and show that

In contrast, at the boundaries it is known that [24]. (We refer to [31] for a detailed discussion.)

The argument is similar to the proof of Proposition ? but with the Golden–Thompson inequality replaced by the Araki–Lieb–Thirring inequality [28]. It states that for we have

with equality if and only if except for as shown in [20].

We give the proof for and note that the argument for is analogous. We have the following variational expressions from Lemma ? and :

where the existence of the minima relies on the fact that both operators have full support. (Note also that these two expressions are in fact equivalent for .) Since , the inequality then follows immediately by the Araki–Lieb–Thirring inequality :

Furthermore, if we have equality. To show that implies , we define the function

For any minimizer of the variational problem in , we have

for all Hermitian . To evaluate the second summand of this Fréchet derivative we used that and commute, which holds by the equality condition for Araki–Lieb–Thirring. We thus conclude that which implies that .

## 4Violation of Data-Processing for α<12

As a further application of the variational characterization of measured Rényi relative entropy, we can show that the data-processing for the sandwiched Rényi relative entropy fails for . (Numerical evidence pointed to the fact that data-processing does not hold in this regime [32].)

In particular, there exists a rank- measurement that achieves and thus violates the data-processing inequality.

First note that implies that the two states are not orthogonal and thus both quantities are finite. For the formulas and still hold. However, in contrast to the proof of Theorem ? we have . Hence, we find by the Araki–Lieb–Thirring inequality that

As in the proof of Theorem ? we have equality if and only if . This implies the claim.

## 5Exploiting Variational Formulas

### 5.1Some Optimization Problems in Quantum Information

The variational characterizations of the relative entropy –, the sandwiched Rényi relative entropy –, and their measured counterparts (Lemma ? and Lemma ?), can be used to derive properties of various entropic quantities that appear in quantum information theory. We are interested in operational quantities of the form

where stands for any relative entropy, measured relative entropy, or Rényi variant thereof, and denotes some convex, compact set of states. For Umegaki’s relative entropy , prominent examples for include the set of

• separable states, giving rise to the relative entropy of entanglement [53].

• positive partial transpose states, leading to the Rains bound on entanglement distillation [39].

• non-distillable states, leading to bounds on entanglement distillation [52].

• quantum Markov states, leading to insights about the robustness properties of these states [22].

• locally recoverable states, leading to bounds on the quantum conditional mutual information [16].

• -extendible states, leading to bounds on squashed entanglement [27].

Other examples are conditional Rényi entropies which are defined by optimizing the sandwiched Rényi relative entropy over a convex set of product states with a fixed marginal, see, e.g., [48].

A central question is what properties of the underlying relative entropy translate to properties of the induced measure . For example, all the relative entropies discussed in this paper are superadditive on tensor product states in the sense that

We might then ask if we also have

To study questions like this we propose to make use of the variational characterizations of the form

where we made use of Sion’s minimax theorem [42] for the last equality. We note that the conditions of the minimax theorem are often fulfilled. The minimization over then typically simplifies and is a convex or even semidefinite optimization. (As an example, for the measured relative entropies the objective function becomes linear in .) We can then use strong duality of convex optimization to rewrite this minimization as a maximization problem [7]:

which, in contrast to the definition of in , only involves maximizations. This often gives useful insights about . As an example, let us come back to the question of superadditivity raised in . We want to argue that the following two conditions on and imply superadditivity. First, we need that the function is superadditive itself, i.e. we require that

And second, we require that the sets are closed under tensor products in the sense that and imply that . Using these two properties, we deduce that

for any and any , . Hence, the inequalities also hold true if we maximize over these variables, implying superadditivity.

### 5.2Relative Entropy of Recovery

In the following we denote multipartite quantum systems using capital letters, e.g., , , . The set of density operators on and is then denoted , for example. We also use these letters as subscripts to indicate which systems operators act on.

As a concrete application, we study the additivity properties of the relative entropy of recovery defined as [41]

where the infimum is taken over all trace-preserving completely positive maps . (We restrict to such that the quantity is surely finite and the infimum becomes a minimum.) One motivation for studying the additivity properties of the relative entropy of recovery is the study of lower bounds on the quantum conditional mutual information and strengthenings of the data-processing inequality for the relative entropy [27]. In particular, [8] shows that

where the systems are understood as and . To obtain a lower bound that does not involve limits, the authors of [8] use a Finetti-type theorem to show that

with the measured relative entropy of recovery defined as [8] (see also [41]),

(We restrict to such that the quantity is surely finite and the infimum becomes a minimum.) This gives an interpretation for the conditional mutual information in terms of recoverability.

The measured relative entropy is superadditive and if this property would translate to we would get an alternative proof for the step . Using the variational expression for the measured relative entropy (Lemma ?) together with strong duality for semidefinite programming, we find the following alternative characterization for .

We write