Metric character of the quantum Jensen-Shannon divergence
In a recent paper, the generalization of the Jensen Shannon divergence (JSD) in the context of quantum theory has been studied (Phys. Rev. A 72, 052310 (2005)). This distance between quantum states has shown to verify several of the properties required for a good distinguishability measure. Here we investigate the metric character of this distance. More precisely we show, formally for pure states and by means of a numerical procedure for mixed states, that its square root verifies the triangle inequality.
pacs:03.67.-a; 03.67.Mn; 03.65.-w
Fundamental physical theories are formulated in terms of an abstract space. This is the case of Relativity Theory, Quantum Mechanics (QM), Yang-Mills like theories, and every proposal for Unified Field Theory. On each abstract space different structures can be defined. For example topological, differentiable, affine and metric structures are ubiquitous in space-time models. A prescription for measuring just how close two points of the concomitant space are is what we mean here by a metric structure. A more precise distinction between a distance and a metric will be given below.
In principle each one of above mentioned structures can be defined in an independent way. In Special Relativity theory the space-time is the standard manifold provided with the (fixed, non-dynamical) Minkowskian metric. In General Relativity, instead, the space-time is a differentiable four dimensional manifold where the metric is given by the matter-energy distribution (throughout the Einstein’s field equations). In both cases the metric is compatible with Lorentz’s covariance. It is worth mentioning here (as known since the pioneering works of Gauss and Riemann) that the metric defines every geometrical property of a differentiable manifold.
In QM the corresponding abstract space is a (finite or infinite dimensional) Hilbert space . In its mathematical formalism the states of a physical system are represented by operators (density operators) acting on . More precisely the states of the system are represented by the elements of , that is, the set of positive trace one operators on . The notion of a state as a unit vector of refers to the extremal elements of ( is extremal if and only if it is idempotent, ). In this case is of the form for some unit vector , and is called a pure state.
In the case of a Hilbert space, the basic underlying structure is that of a vectorial space provided with an internal product between elements of . From this inner product several ways of measuring “proximity” between two elements of can be defined. For example, the Wootters’s distance
is a very important one. On one side (1) represents the angle between the (pure) states and ; on the other, it has to do with the statistical fluctuations in the outcomes of measurements into the QM formalism Wootters . Finally, (1) is invariant under unitary evolution. Therefore, we can think of (1) as a very natural distance between pure states in QM, in some sense imposed by the quantum theory itself. A generalization of this distance to mixed states have been studied by Braunstein and Caves Braunstein .
Before going on let us remind the reader of a formal distance-definition. Let be an abstract set. A function
is a distance defined over the set , if for every , it satisfies the following properties:
If, for every , the function also verifies the triangle inequality:
it is said that is a metric for the space . Incidentally, we mention that the function given by (1) is a metric. However, only a few among all distances between quantum states historically introduced in the literature verify condition (3).
The definition of distance between mixed quantum states is a topic of permanent interest. This interest has been lately rekindled on account of problems emerging in information theory (QIT)Lindblad ; Jozsa ; Lee ; Luo ; Markham . In introducing distances between quantum states, different roads have been traversed. We have already mentioned the case of the Wootters’s distance and its generalization, presented in Braunstein . Recently, a rather interesting approach has been advanced by Lee et al. in reference Lee . There these authors characterize the degree of closeness of two states with regards to the information that can be attained for each of them from a complete set of mutually complementary measurements plus an invariance criterium. The resulting distance-measure is equivalent to the Hilbert-Schmidt metric. Let us recall that this metric emerges from the primitive structure of the Hilbert space. Indeed, an inner product between bounded operators acting over the Hilbert space can be defined in the fashion
The Hilbert-Schmidt norm of the operator is given by and from this, the Hilbert-Schmidt metric between two operators and is defined as
Another way of dealing with the problem of introducing distances between quantum states is generalizing the notions of distance defined in the space of classical probability distributions. This is the case of the relative entropy, which is a generalization of information theoretic Kullback-Leibler divergence. The relative entropy of an operator with respect to an operator , both belonging to , is
where stands for logarithm in base two. The relative entropy is not a distance (and obviously is not a metric either) because it is not symmetric and does not verify the triangle inequality (3). Worst, it may even be unbounded. In particular, the relative entropy is well defined only when the support of is equal to or larger than that of Lindblad (the support of an operator is the subspace spanned by the eigenvectors of the operator with nonzero eigenvalues). This is a strong restriction which is violated in some physically relevant situations, as for example when is a pure reference state.
To overcome such problems we have recently investigated an alternative to the relative entropy Majtey that emerges as a natural extension of a symmetrized version of the Kullback-Leibler divergence to the realm of quantum theory. In the classical context this quantity is known as the Jensen-Shannon divergence (JSD) and was introduced by C. Rao Rao and, independently, by J. Lin Lin . It has been applied to a diversity of problems arising in statistics and physics Roldan ; Lamberti1 ; Lamberti2 ; Rosso ; Crooks . Among its most significant properties one can include its boundedness and its metric character Endres . In reference Lamberti1 it is shown that the JSD can be taken as a unifying distance between probability distributions.
In our previous study of the quantum JSD we showed that it verifies all the properties required for a good measure of distinguishability between quantum states. In this paper we investigate the metric property of the quantum JSD (QJSD), that could be regarded as essential to check on the convergence of iterative algorithms in quantum computation Galindo .
The structure of this paper is as follows: next Section is devoted to the formal definition of the classical and QJSD. In Section III we investigate the metric character of the QJSD. In the first place we consider the pure states case and then we investigate the metric properties for arbitrary mixed states recourse to numerical simulations in different Hilbert spaces. Finally, some conclusions are drawn in Sect IV.
Ii Classical Jensen-Shannon divergence and its quantum extension
The classical JSD between two (discrete) probability distributions and , is defined as
where is the Kullback-Leibler divergence. can be also expressed in the form
where is the Shannon entropy. The classical JSD exhibits several interesting properties. Among them we recall the following ones
is symmetric and always well defined;
it is bounded
and, as it was already stated,
its square root,
verifies the triangle inequality Eq. (3) (but does not).
A proof of this last fact can be found in references Endres ; Oster . Alternatively, this can be proved by using some results of harmonic analysis due to I. Schoenberg Schoe . The basic fact that makes Schoenberg’s theorem applicable to the classical JSD resides in that it is a definite negative kernel, that is, for all finite collection of real numbers , and for all corresponding probability distributions , the implication
The classical JSD can be used to distinguish two probability distributions and therefore can be used as well to do so for two quantum states described by their density operators, say, and . Indeed, let us suppose we choose a positive operator value measure (POVM), , that generates two probability distributions via
for . Then we can use the JSD (6) to distinguish between these two distributions. In this procedure we have the freedom of choosing the POVM which most clearly distinguishes from , that is, which makes the value of the largest. This reasoning motivates to introduce the quantity
where the supremum is taken over all POVM’s. Physically gives the best discrimination between the states and that we can achieve by means of measurements.
By mimicking the extension of Kullback-Leibler divergence to the realm of quantum theory, we define the QJSD as Majtey
that can be recast in terms of the von Neumann entropy in the fashion
This quantity is always well defined, symmetric, positive definite and bounded (). By using the corresponding properties of the relative entropy donald and expression (11) it can be shown that, for arbitrary and , the following inequality
is valid. The equality is satisfied if and only if and commute, that is, the upper bound in (13) is, in general, not attainable for any POVM.
To conclude this section we give the explicit expression for the QJSD in terms of the eigenvalues and eigenvectors of the operators involved in its expression.
where , , and
Iii The metric character of the quantum
In this section we investigate the putative metric character of the QJSD, that is we try to ascertain whether the square root of the QJSD,
verifies the triangle inequality. The other three properties for a metric are obviously verified by (15). A formal proof of property (3) for has until now eluded us. Unfortunately there is no analog of Schoenberg’s theorem when operators are involved. Still more, there is no direct way of verifying condition (9) for expression (14). No extension to the case of the QJSD of the proof given in Endres has been possible. Incidentally it should be observed that, if the upper bound in (13) could be attained for some POVM, the proof of the triangle inequality for (15) would be obvious (because verifies it).
The results to be presented here correspond to a separate analysis of the metric condition for (15) for the two cases: when (15) is restricted to pure states and when it acts on the complete set . In the first instance we were able to give a formal proof of inequality (3); in the second one, we checked it by means of a numerical algorithm.
iii.0.1 Pure states
For a pure state the von Neumann entropy vanishes. Then, for two pure states,
the QJSD (12), becomes
After some algebra, we can rewrite (17) in terms of the inner product :
The entropy of the average can be interpreted to the light of quantum information theory. Indeed, let us suppose that Alice has a source of pure qubit signal states and . Each emission is chosen to be or with an equal prior probability a half. Then the density matrix of the source is . Alice may communicate the sequence of states to Bob by transmitting one qubit per emitted state. But according to the quantum source coding theorem, (17) gives the lowest number of qubits per states that Alice needs to communicate the quantum information (with arbitrarily high fidelity) Schum .
Let us take two fixed arbitrary pure states and and an arbitrary third one, . Denote the absolute value of the inner products , and with and , respectively, and introduce then the function
In terms of these variables the triangle inequality for (15) reads:
We can decompose the vector into i) a part belonging to the plane determined by and and ii) another part perpendicular to that plane:
with and . Then
As a function of and , for fixed, is a concave function on the circles and (in the sense that its second derivative is negative) and it vanishes for and . This guarantees that inequality (19) is satisfied for arbitrary and .
iii.0.2 Arbitrary mixed states
Here we attempt a numerical verification of the triangle inequality for the distance (15) when arbitrary mixed states are involved. As a first approach, we numerically evaluate the inequality (3) by generating random states in a N-dimensional Hilbert space. The space of all (pure and mixed) such states can be regarded as a product space of the form BATPLA06 :
where stands for the family of all complete sets of ortho-normal projectors ( the identity matrix), and is the set of all real tuples of the form Any state in is of the form
In exploring exhaustively we need to introduce an appropriate measure on this space. Such a measure is required to compute volumes within , as well as to determine what is to be understood by a uniform distribution of states on . The measure that we adopt here is taken from the work of Zyczkowski et al. ZHS98 ; Z99 .
An arbitrary (pure or mixed) state of a quantum system described by an -dimensional Hilbert space can always be expressed as a product of the form:
Here is an unitary matrix and is an diagonal matrix whose diagonal elements are, precisely, our above defined . The group of unitary matrices is endowed with a unique, uniform measure, known as the Haar’s measure, PZK98 . On the other hand, the -simplex , consisting of all the real -uples appearing in (20), is a subset of a -dimensional hyperplane of . Consequently, the standard normalized Lebesgue measure on provides a measure for . The aforementioned measures on and lead to a measure on the set of all the states of our quantum system ZHS98 ; Z99 ; PZK98 , namely,
In our numerical computations we randomly generate mixed states according to the measure (21). In order to assess, for these randomly generated states, how the triangle inequality (3) is satisfied, we define the auxiliary quantity
and evaluate it for a large enough number of simulated states. This procedure is repeated for different dimensions of the Hilbert space.
We investigate the positivity of , upon which the metric character of the square root of the QJSD is based, by constructing the probability distributions for the values of . The corresponding histograms, for different dimensions of the Hilbert space, are depicted in Fig. 1. As we are mainly interested in the positivity of , we just plot the tails of the concomitant distributions , selecting the portion for which one has, say, . Such a choice allows us to portray in sufficient detail the region of the distribution where a violation of the inequality (3) can be detected.
The probability for the particular value actually represents the probability for finding a triplet of density matrices for which . None such triplet of states has been found, which entails that the probabilities for violating the triangular inequality vanish for all the distinct Hilbert-space dimensions we have considered here. Actually, the probability for low values of becomes significantly smaller as the dimension of the pertinent Hilbert space under study augments (the PDF’s for higher Hilbert space dimensions than those here reported have been also computed).
The total number of randomly generated states was rather large () in order to obtain a sufficiently large number of points belonging to the tail-regions. These points fall then within the zone of low probabilities. The fact that no triplet of states violating inequality (3) has been encountered could be thought of as being numerical evidence for the metric character of the square root of the QJSD. The distributions in Fig. 1 clearly depend on the measure (21) used to compute them. Higher probabilities for low values of can actually be obtained if one restricts the computation of the histograms to states with a high degree of mixedness, although it must be noted that such probabilities still diminish as the dimension of the associated Hilbert space grows.
To avoid a statistical dependence on the measure (21) we propose an alternative numerical approach by performing a numerical minimization of . Any quantum mixed state is completely determined by a finite number of parameters which depends on the dimension of the Hilbert space. To determine the minimum possible value of , one needs to find the optimal values for such parameters. To such an end we use a simulated annealing algorithm in which the parameters are iteratively modified until convergence to the optimal values is reached.
After running this algorithm for different Hilbert space dimensions and for different triplets of initials states, one detects always convergence to the same solution
The optimal situation is reached when and are equal. In our numerical search these states are always found to coincide with the maximally mixed state for the Hilbert space-dimension considered in each case. It is actually not enough to minimize Eq. (22), because we wish it to be a minimum for any of the three different ways to order the three states. If we minimize the average of those three possible orderings, the minimum is also , and it is obtained when the three states become the maximally mixed state.
This last method, although does not provide us with a formal proof of the metric character of the square root of the QJSD for mixed states, does yield a clear and strong evidence about the validity of the conjecture advanced in the initial part of this paper, that constitutes the leitmotif of this work.
The main purpose of this work was to investigate the metrical property of the QJSD. We were able to show that the square root of the QJSD verifies the triangle inequality, giving to this distance the character of a metric. Although we have proved this claim (for mixed states) only by giving numerical evidences, we believe that the cases here analyzed are sufficiently representative so as to render credible the claim that metric properties are verified in general for the QJSD.
A second item deserves to be pointed out, which emerges from the following two facts:
On the one hand, we have showed that, when restricted to pure states, the square root of the entropy of the average is a true metric.
On the other hand, a classical result from Uhlmann Jozsa asserts that the fidelity of states and
can be expressed in the form
where the maximization is over all purifications of and all purifications of Nielsen .
These two facts motivate us to introduce an alternative metric for arbitrary mixed states. Given two arbitrary mixed states and we can define
where the minimum is taken over all purification of and all purifications of . In (25) we must look for the minimum, not for the maximum as in (24), due to the decreasing nature of , eq. (18), as a function of .
Obviously the basic properties required for a good distinguishability measure are inherited by (25) from those verified by the QJSD. Additionally, several interesting questions arise from this proposal. For example, what relations exist between (25) and (11); or, in general, how (25) relates to other quantum distances. A more detailed study of the properties of this quantity will be presented elsewhere.
Acknowledgements.This work was partially supported by the MEC grant FIS2005-02796 (Spain) and FEDER (EU) and by CONICET (Argentine Agency). AB and APM acknowledge support from MEC through FPU grant AP-2004-2962 and contract SB-2006-0165. PWL wants to thank SECyT-UNC (Argentina) and CONICET for financial support.
- (1) W.K. Wootters, Phys. Rev. D 23, 357 (1981).
- (2) S.L. Braunstein and C.M. Caves, Phys. Rev. Lett. 72, 3439 (1994).
- (3) G. Lindblad, Commun. Math Phys. 33, 305 (1973).
- (4) R. Jozsa, J. Mod. Optics 41, 2315 (1994).
- (5) J. Lee, M.S. Kim and C. Brukner, Phys. Rev. Lett. 91, 087902 (2003).
- (6) S. Luo and Q. Zhang, Phys. Rev. A 69, 032106 (2004).
- (7) D. Markham, J. Miszczak, Z. Puchala and K. Zyczkowski, arXiv:0711.4286v1 (2007).
- (8) A.P. Majtey, P.W. Lamberti, and D.P. Prato, Phys. Rev. A 72, 052310 (2005).
- (9) C. Rao, IMS-Lect. Notes 10, 217 (1987).
- (10) J. Lin, IEEE Trans. Inf. Theory 37, 145 (1991).
- (11) I. Grosse, P. Bernaola-Galvan, P. Carpena, R. Roman-Roldan, J. Oliver and H.E. Stanley, Phys. Rev. E 65, 041905 (2002).
- (12) A. Majtey, P.W. Lamberti, M.T. Martin and A. Plastino, Eur. Phys. J. D 32, 413 (2005).
- (13) M. Pereyra, P.W. Lamberti and O.A. Rosso, Physica A 379, 122 (2007).
- (14) O. A. Rosso, H. A. Larrondo, M. T. Martin, A. Plastino, and M. A. Fuentes, Phys. Rev. Lett. 99, 154102 (2007).
- (15) G.E. Crooks, Phys. Rev. Lett. 99, 100602 (2007).
- (16) A. Galindo and M. A. Martín-Delgado, Rev. Mod. Phys 74, 347 (2002).
- (17) D. Endres and J. Schindelin, IEEE Trans. Inf. Theory 49, 1858 (2003).
- (18) F. Österreicher and I. Vajda, Ann. Inst. Stat. Math. 55, 639 (2003).
- (19) I.J. Schoenberg, Trans. Am. Math. Soc. 44, 3 (1938).
- (20) B. Fuglede and F. Topsoe, Proceedings of the International Symposium on Information Theory, ISIT 2004, p. 31. Published by IEEE (2004).
- (21) M.J. Donald, Comm. Math. Phys. 105, 13 (1986).
- (22) B. Schumacher, Phys. Rev. A 51, 2738 (1995).
- (23) I. Bengtsson and K. Zyczkowski, Geometry of Quantum States: An Introduction to Quantum Entanglement (Cambridge University Press, Cambridge, 2006).
- (24) J. Batle, M. Casas, A. R. Plastino, and A. Plastino, Phys. Lett. A 353, 161 (2006).
- (25) K. Zyczkowski, P. Horodecki, A. Sanpera and M. Lewenstein, Phys. Rev. A 58, 883 (1998).
- (26) K. Zyczkowski, Phys. Rev. A 60, 3496 (1999).
- (27) M. Pozniak, K. Zyczkowski and M. Kus, J. Phys. A: Math. Gen. 31, 1059 (1998).
- (28) M. Nielsen and I. Chuan, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, England, 2000.