The quantum Chernoff bound as a measure of distinguishability between density matrices: application to qubit and Gaussian states
Hypothesis testing is a fundamental issue in statistical inference and has been a crucial element in the development of information sciences. The Chernoff bound gives the minimal Bayesian error probability when discriminating two hypotheses given a large number of observations. Recently the combined work of Audenaert et al. [Phys. Rev. Lett. 98, 160501] and Nussbaum and Szkola [quant-ph/0607216] has proved the quantum analog of this bound, which applies when the hypotheses correspond to two quantum states. Based on the quantum Chernoff bound, we define a physically meaningful distinguishability measure and its corresponding metric in the space of states; the latter is shown to coincide with the Wigner-Yanase metric. Along the same lines, we define a second, more easily implementable, distinguishability measure based on the error probability of discrimination when the same local measurement is performed on every copy. We study some general properties of these measures, including the probability distribution of density matrices, defined via the volume element induced by the metric, and illustrate their use in the paradigmatic cases of qubits and Gaussian infinite-dimensional states.
About fifty years ago Herman Chernoff proved his famous bound, which characterizes the asymptotic behavior of the minimal probability of error when discriminating two hypothesis given a large number of observations Chernoff (1952). Its quantum analog was recently conjectured Ogawa and Hayashi (2004) and finally proven by combining the results of two recent publications Audenaert et al. (2007a); Nussbaum and Szkola (2006). In this quantum setting one is confronted with the problem of knowing the minimum error probability in identifying one of two possible known states of which identical copies are given. Hereafter we will refer to this minimum simply as the error probability . This problem is widely known as quantum state discrimination 111See J. A. Bergou and Hillery (2004) and Chefles (2000) for two reviews on the recent and more historical developments of this field respectively.. Its difficulty (but also its appeal) lies in the fact that quantum mechanics only allows for full discrimination of such states when they are orthogonal. This has both fundamental and practical implications that lie at the heart of quantum mechanics and its applications.
For these past fifty years the classical Chernoff bound —as well as hypothesis testing in general— has proved to be extremely useful in all branches of science. Likewise, one would expect its quantum version to be far more than a mere academic issue. The characterization and control of quantum devices is a necessary requirement for quantum computation and communication, and quantum hypothesis testing is specially designed for assessing the performance of these tasks. Particularly important examples for which state discrimination plays an essential role are quantum cryptography Gisin et al. (2002), classical capacity of quantum channels Hayashi and Nagaoka (2003), or even quantum algorithms Bacon et al. (2005). Equally important are some new theorems concerning different quantum extensions of hypothesis testing: the quantum Stein’s lemma, proved some years ago Hiai and Petz (1991); Ogawa and Nagaoka (2000), and the quantum Hoeffding bound, recently established in Nagaoka (2006); Hayashi (2006a); Audenaert et al. (2007b).
In this paper we study the classical and the quantum Chernoff bounds in connection to measures of distinguishability for quantum states, putting special emphasis on the qubit and Gaussian cases. We start by reviewing classical and quantum hypothesis testing and the corresponding Chernoff bounds in Sec. II and Sec. III, respectively (the latter includes the before mentioned recent results by Nussbaum and Szkola Nussbaum and Szkola (2006) and Audenaert et al. Audenaert et al. (2007a)). In Sec IV we discuss the notion of a distinguishability measure for quantum states. We briefly motivate an important instance of such a notion based on classical statistical measures, that is, the quantum fidelity, and move to a fully operational alternative, based on the asymptotic rate exponent of the error probability in symmetric quantum hypothesis testing: the quantum Chernoff measure222By ‘operational’ it is meant ‘defined though a specific procedure or task’, in contradistinction to ‘purely mathematical’. . We also discuss a similar distinguishability measure derived from the same rate exponent when the decision is based on identical single-copy (local) measurements —instead of the collective measurements on the copies assumed in the derivation of the quantum Chernoff bound. In Sec. V we study the metrics induced by the previously defined measures of distinguishability and give explicit expressions for general -dimensional systems. We also give the probability distribution of the eigenvalues of a density matrix based on the quantum Chernoff metric (induced by the corresponding distinguishability measure). We find that the metric based on local measurements is discontinuous and has to be defined piecewise: on the set of pure states, where it agrees with the Fubini-Study metric, and, separately, on the set of strictly mixed states, where it agrees with one-half the Bures-Uhlmann metric. The quantum Chernoff metric, in contrast, is continuous and smoothly interpolates between the Fubini-Study and one-half the Bures-Uhlmann metrics. In Sec. VI we concentrate on the particular case of two-level systems and study in some depth the differences between the quantum Chernoff measure and metric and those based on identical local measurements. In Sec. VII we give explicit expressions of the quantum Chernoff measure and its corresponding induced metric for general Gaussian states. Finally, we state our conclusions in Sec. VIII.
Ii Classical hypothesis testing: Chernoff bound
One of the most fundamental problems in statistical decision theory is that of choosing between two possible explanations or models, that we will refer to as hypothesis and , where the decision is based on a set of data collected from measurements or observations. For example, a medical team has to decide whether a patient is healthy (hypothesis ) or has certain disease (hypothesis ) in view of the results of some clinical test. Often, is called the working hypothesis or null hypothesis, while is called the alternative hypothesis. In general these two hypotheses do not have to be treated on equal footing, since wrongly accepting or rejecting one of them might have very different consequences. These two types of errors, i.e., the rejection of a true null hypothesis or the acceptance of a false null hypothesis, are called type I or type II errors respectively, and their corresponding probabilities will be denoted by and throughout the paper. In our example, failure to diagnose the disease is a type II error, whereas it is a type I error to wrongly conclude that the healthy patient has the disease. Of course it would be desirable to minimize the two types of errors at the same time. However, this is typically not possible since reducing those of one type entails increasing those of the other type. Hence, a common way to proceed is to minimize the errors of one type, while keeping those of the other type bounded by a constant (which may depend on the number of observations). Another (Bayesian-like) approach consists in minimizing a linear combination of the two error probabilities , where and can be interpreted as the a priori probabilities that we assign to the occurrence of each hypothesis. In this paper we consider this latter approach, which is known as symmetric hypothesis testing.
For the sake of simplicity, we assume to start with that , and we deal with tests that have only two possible outcomes, . This is, for example, the situation that corresponds to the identification of a biased coin that can be (with equal probability) of one of two types: or (corresponding to hypothesis or respectively). If it is of the type the probabilities of obtaining head and tail are respectively and , while if it is of type we write and . The test consists in tossing the coin, which has two possible outcomes: either head () or tail ().
If we can toss the coin only once (single observation), it is easy to convince oneself that the minimum (average) probability of error is attained when we accept the hypothesis (decide that the tossed coin is of the type) for which the observed outcome occurs with largest probability. Therefore 333In this formula, as well as in most of the formulas involving minimization throughout the paper, one should properly write instead of since the minimum may not exist if and ( and in the quantum case) are degenerate and have different support. This is so because in this case the continuity of the argument of in all these equations is guaranteed only in the open interval and (end-point) singularities may occur at . We will overlook this mathematical subtlety in the main text to simplify the exposition.
where we have used the inequality . The subscript CC stands for classical Chernoff. This expression also holds for tests with more than two outcomes. We just need to extend the sum over to the entire range of possible outcomes. In what follows, we leave the range of unspecified whenever an expression is valid for an arbitrary number of outcomes.
Next, let us assume we can toss the coin times. The set of possible outcomes (the sample space) is the -fold Cartesian product of (or ). The two probability distributions of these outcomes, and , will be given by the product of the corresponding single-observation distributions, , where now , and one immediately obtains Cover and Thomas (1991)
This is the Chernoff bound Chernoff (1952). It is specially important because it can be proved to give the exact asymptotic rate exponent of the error probability, that is,
The so-called Chernoff information, or Chernoff distance, , can also be written in terms of the Kullback–Leibler divergence Cover and Thomas (1991):
is a family of probability distributions known as the Hellinger arc that interpolates between and , and is the value of at which the second equality in (4) holds. In other words, it is the point at which is equidistant to both and (in terms of Kullback–Leibler distance). It can be shown that is also the value of that minimizes the right hand side of (II).
For the case of measurements with two outcomes, such as the example of the coins discussed above, one can give a closed expression for the Chernoff distance, which we denote in this binary case as :
The parameter has a very straightforward interpretation. If is the number of heads (of ’s) after trials, which according to the distribution occurs with probability
[according to the distribution it occurs with probability , defined the same way but with replaced by ], then is the fraction of heads above which one must decide in favor of . That is, if one accepts hypothesis , while if one accepts . Asymptotically, the contribution to the error probability is dominated by situations where , i.e., by events that occur with the same probability for both hypotheses (see Fig. 1). The probability of such events is clearly a lower-bound to the probability of error. It is straightforward to check that [or equivalently ] coincides with the upper bound given by the Chernoff distance . This proves that the Chernoff bound is indeed attainable.
Iii Quantum hypothesis testing: The quantum Chernoff bound
We now tackle discrimination (symmetric hypothesis testing) in a quantum scenario. We consider two sources, and that produce states described respectively by the density matrices and acting on a Hilbert space . We are given copies of a state with the promise that they have been produced either by the source (with prior probability ) or by the source (with prior probability ). Accordingly, we can formulate two hypothesis ( and ) about the identity ( or , respectively) of the source that has produced these copies. We wish to find a protocol to determine, with minimal error probability, which hypothesis better explains the nature of the copies. No matter how complicated this protocol might be, it is clear that the output must be classical: we have to settle for one of the two hypotheses. Therefore the protocol develops in two stages. First, to obtain information about the states we must necessarily make a (quantum) measurement, which in contrast to the classical world is an inherently random and destructive process. Second, one has to provide a classical algorithm that processes the measurement outcomes (classical data) and produces the best answer ( or ). Quantum mechanics allows for a convenient description of this two-step process by assigning to each answer, and , a single POVM (positive operator valued measure) element and respectively ( acts on ; ). The probability that this POVM measurement gives the answer conditioned to is .
The problem thus reduces to finding the set of operators that minimize the mean probability of error, For the simplest case of a single copy () and two equiprobable hypotheses () it is Helstrom (1976)
Since , we can introduce the Helstrom matrix , as is common in quantum state discrimination, and write
which only needs to be optimized with respect to . We note that has some negative eigenvalues, as . This necessarily implies that the minimum error probability is attained if is the projector on the subspace of positive eigenvalues of . We will denote this projector by and define the positive part of as . Taking into account that is traceless, we obtain
where the matrix (absolute value of ) is defined to be . We arrive at the final result Helstrom (1976),
The problem of discriminating multiple copies (arbitrary ) is thus formally solved by replacing by in the above equations. Indeed, if we do not have any restrictions on the type of measurements performed on the copies, , and the mean probability of error is just
However, the computation of the trace norm of the Helmstrom matrix in (13) is tedious and, moreover, this equation provides little information about the large behavior of the error probability, which is what the Chernoff bound is about.
The quantum version of the Chernoff (upper) bound was presented very recently in Audenaert et al. (2007a). There it is shown that
(the subscript QC stands for quantum Chernoff), which holds for arbitrary density matrices. Moreover, this bound can be very efficiently computed.
Let and be two positive operators, then for all ,
The proof of this theorem involves advanced methods in matrix algebra and we refer the interested reader to Audenaert et al. (2007a). Instead, here we will give a simple proof of the inequality (14) where instead of minimizing over , the particular value will be chosen.
We first notice that one obtains an upper-bound to by picking any particular positive operator (and, accordingly, ) in (9). A convenient choice is (and thus ), where, as above, stands for the projector onto the subspace spanned by the eigenstates of with positive eigenvalue. After the following series of inequalities we arrive to the desired result Hayashi (2006b):
where in the second inequality we have used
The general proof (for all ) follows the same steps but taking if and if . In this case, the inequality analogous to the second one in (III) requires the two additional non-obvious relations
These inequalities follow immediately from the following non-trivial lemma, which constitutes the core of the proof Audenaert et al. (2007a):
Let and be two positive operators, then for all ,
Before proceeding with the the asymptotic limit, several comments about (14) are in order. (i) The exponential fall-off of the probability of error when a number of copies is available follows immediately from :
Remarkably enough, this rate exponent, which we may call quantum Chernoff information because of its analogy with , is asymptotically attainable, as follows from the results of Nussbaum and Szkola (2006). This is the quantum extension of the classical result (II) and was first conjectured by Ogawa and Hayashi in Ogawa and Hayashi (2004). (ii) If the two matrices and commute the bound reduces to the classical Chernoff bound (1), where the two probability distributions are given by the spectrum of the two density matrices. (iii) The function (whose minimum gives the best bound) is a convex function of in , which means that a stationary point will automatically be the global minimum (see Audenaert et al. (2007a) for a proof). This is a very useful fact when computing the quantum Chernoff bound (14). (iv) is jointly concave in , unitarily invariant, and non-decreasing under trace preserving quantum operations Audenaert et al. (2007a). (v) The quantum Chernoff bound gives a tighter bound than that given by the quantum fidelity
which is the most widely used quantum distinguishability measure (see next section). This follows from the following set of inequalities:
In fact, the fidelity also provides a lower-bound to the probability of error Fuchs and de Graaf (1999):
(vi) The quantum Chernoff bound can be easily extended to the case where the two states and (sources) are not equiprobable:
(vii) The permutation invariance of the -copy density matrices, , guarantees that the optimal collective measurement can be implemented efficiently (with a polynomial-size circuit known as quantum Schur transform) Bacon et al. (2006), and hence that the minimum probability of error is achievable with reasonable resources.
As stated above, for multiple-copy discrimination the error probability decreases exponentially with the number of copies: as goes to infinity Cover and Thomas (1991). The error (rate) exponent is defined generically by
and characterizes the asymptotic behavior of the error probability. From (20) we readily see that if the best (joint) measurement is used it coincides with the quantum Chernoff information,
where the equality holds because of the attainability of (20) discussed above and we have added the subscript QC. Moreover, this asymptotic value is also attained by the square root (or “pretty-good”) measurement (see Holevo (1979); Hausladen et al. (1996) for the precise definition). This immediately follows from the known bounds Barnum and Knill (2002); Harrow and Winter (2006) , where is the error probability of discrimination when the square root measurement is used.
Iv Distinguishability measures
In this section we aim to define a measure of distinguishability between states using the results reviewed in Sec. III. Before doing so we will briefly outline how classical statistical methods can be used to (partially) accomplish this goal. We will then discuss an operational measure of distinguishability based on the error probability in multiple-copy state discrimination, leading to the quantum Chernoff measure. Finally we will define the analogous quantity for local discrimination protocols.
iv.1 Classical statistical approach
The notion of distance between states is a fundamental issue that has been studied for a long time. A straightforward way to define such a distance is to take any suitable norm in the space of states. However, a more physical approach, kick-started by the pioneering work in Wootters (1981), is to relate the inherently probabilistic nature of quantum measurements to classical statistical measures of distinguishability between probability distributions.
In particular, the author in Wootters (1981) uses the notion of statistical distance,
as a measure of distinguishability between the probability distributions and , where
is the statistical fidelity. Accordingly, he defines a distinguishability measure between quantum states and by maximizing [i.e., minimizing ] over all possible POVM measurements, characterized by all possible sets of operators with outcome probabilities given by and . The statistical distance as such makes sense only when the number of samplings of the probability distribution is large. Hence, in the quantum extension of this notion it is implicitly assumed that one performs the same measurement on each of a large number of copies of the state . The optimization over such local repeated measurements leads to one of the most widely used distinguishability measures Fuchs (1996): The (quantum) fidelity , defined in (21).
The fidelity, or statistical distance, has many desirable properties: (i) it is easily computable; (ii) for pure states it reduces to the standard distance given by the angle between rays in the Hilbert space ; (iii) as mentioned above, it provides bounds to . Nevertheless, a strict physical interpretation is so far unclear, and its definition is based on repeated local measurements, while quantum mechanics allows for much more general ways to access the information contained in the copies, via collective measurements on the whole of them.
iv.2 Quantum Chernoff distance
A very natural and also operational distinguishability measure is provided by the error probability of discrimination. As a first candidate, one could take this very error probability for a given fixed number of copies. However, the choice of a particular in such a definition would not only be arbitrary but also problematic since one can find examples Cover and Thomas (1991) where , whereas for a different number of copies. A straightforward way to go around this problem is to use the asymptotic expressions for and define the distinguishability measure as the largest rate exponent in (26). We further note that the presence of the logarithm ensures that if and only if , while the minus sign makes distinguishability decrease as discrimination becomes more difficult, i.e., as increases.
The quantum Chernoff information, , is therefore a physically meaningful and efficiently computable distinguishability measure. Note that (27) does not stricto sensu define a distance, since it does not fulfil the triangular inequality. It has however all of the other properties that one should expect from a reasonable measure. This, in itself, is already a remarkable fact since, as far as measures and metrics are concerned, there is usually a compromise among operational definiteness, computability and contractivity Gilchrist et al. (2005). For instance, the distance proposed in Lee et al. (2003), although having an operational definition, is not contractive.
We point out that another operational distinguishability measure can be obtained in asymmetric hypothesis testing by minimizing the type II error rate while keeping the type I error rate upper-bounded by a fixed value. The optimal error rate in this situation is provided by the quantum Stein’s Lemma Hiai and Petz (1991); Ogawa and Nagaoka (2000) and leads to the well known quantum relative entropy. Despite of having an operational meaning, the quantum relative entropy has two obvious drawbacks as a distinguishability measure: it is not symmetric on its arguments and it diverges if one of the states is pure.
iv.3 Classical Chernoff distance: local measurements
In the derivation of the quantum Chernoff bound one optimizes over all possible quantum measurements, in particular over quantum joint measurements on , that act over all the copies coherently. It is of great interest, both theoretically and in practice, to know whether such joint measurements are strictly necessary to attain the bound or one can make do with separable ones (which include those that can be implemented with local operations and classical communication, simply known as LOCC measurements). As far as we are aware, the answer to this is unknown. This question is also relevant in connection with the operational meaning attached to . In this section we focus on this operational aspect and compute from its definition in (26) assuming that the discrimination protocol refers to is constrained to make use of the same individual measurements, defined by a local POVM , on each of the available copies. We loosely refer to these protocols as local. Local protocols are relevant from the theoretical point of view since they help to elucidate the role of quantum correlated measurements in asymptotic hypothesis testing. For example, in quantum phase estimation local measurements suffice to achieve the collective bounds Holevo (1979). Here, we will show that these protocols do not achieve the quantum Chernoff bound. In addition, from a more practical point of view, local protocols are much simpler to implement experimentally, specially in a situation where the number of sub-systems is increasingly large.
In such a local protocol, after the measurements have been performed we have a sample of elements of the probability distribution , , based on which we have to discriminate between the candidate or . In such a scenario the error probability, which we call , can be obtained using the classical Chernoff bound (1) applied to the distributions and . One can thus define the error exponent (26) and thereby introduce a new operational distinguishability measure based on local discrimination:
where the subscript reminds us that we have made use of the classical Chernoff bound.
The measure is obtained by maximizing the rate exponent over all possible single-copy generalized measurements (just as is done for the fidelity). Unfortunately, there is no simple closed expression for this maximum for general mixed states. However, we do encounter again the relation (22) with the fidelity: since the square root of the statistical fidelity upper bounds in (1), it also upper bounds the local error probability . That is,
Since , we note that whenever the inequality (34) has to be saturated. This, in turn, means that in this situation one can optimally discriminate between and just by performing a fixed local measurement on each of the copies (no collective measurements are required to attain the quantum Chernoff bound).
There is still another important situation when the quantum Chernoff bound is attainable by local measurements: when one of the states (say ) is pure. If this is the case, Eq. (24) holds and . To prove that , let us consider the two-outcome measurement defined by , . Note that and . After performing this measurement on each of the copies the protocol proceeds as follows: we accept if all of the outcomes are , otherwise we accept . One may refer to this classical data processing as unanimity vote Acin et al. (2005). The error probability can be easily computed by noticing that no error occurs unless we get times the outcome [since ]. Therefore,
where the last equality holds because is assumed to be a pure state. From this equation it follows immediately that , and the quantum Chernoff bound is attainable by local measurements. It also follows from the first equality in (35) that this result corresponds to taking the limit in (1).
The set of states of a quantum system, as that of classical probability distributions on a given sample space,444For sake of clarity, in this section we assume a finite sample space, but the results hold also for general probability measures over continuous spaces. can be endowed with a metric structure Bengtsson and Życzkowski (2006), and thus thought of as a Riemannian manifold. This enables us to relate geometrical concepts (e.g., distance, volume, curvature, parallel transport) to physical ones (e.g., state discrimination and estimation, geometrical phases). Among the novel applications of metrics in quantum information, they have been recently used to characterize quantum phase transitions Zanardi et al. (2007).
The first step towards this geometric approach to quantum states is to define the line element or (infinitesimal) distance between two neighboring “points” and . All local properties follow from this definition. More precisely, they follow from the metric, i.e., from the set of coefficients of when written as a quadratic form in the differentials of the coordinates (parameters) that specify the quantum states. There is, however, no unique choice of unless some monotonicity conditions are invoked.
For classical probability distributions, , a line element is singularized (up to a propotionality factor) by imposing that it be non-increasing under stochastic maps. It is the well known Fisher metric (in what follows the terms metric and line element will be used interchangeably):
In contrast to the classical case, the monotonicity condition under completely positive (quantum stochastic) maps does not define a metric uniquely, which explains why a substantial body of research on quantum metrics has emerged over the last years. Among the main developments, Petz Petz (1996) has characterized the family of quantum contractive metrics by establishing a correspondence with operator-monotone functions.
An alternative, more physical approach is to define a line element from a suitable distinguishability measure between infinitesimally close states. A remarkable example is given in Braunstein and Caves (1994). In this seminal paper Braunstein and Caves consider a one-parameter family of states and map the problem of distinguishability to that of estimating the parameter optimally. They define a line element, , as expressed in the appropriate units of statistical deviation (roughly speaking, divided by the minimal error in the estimation of ). By making use of classical statistical methods (Cramér-Rao bound) they find
where (it is the so called Fisher information), with , and the maximization is over all possible POVM measurements on a single copy of . They also succeed in giving a closed expression for and show that their metric coincides up to a factor with that induced by the Bures-Uhlmann distance Bures (1969); Uhlmann (1976)
More precisely, they show that , where
[see also (69) below] and a series expansion to is understood in the right hand side of this equation. We note in passing that for commuting states, i.e., classical probability distributions, the Bures-Uhlmann line element coincides with the Fisher metric (36). A quantum metric with such normalization is said to be Fisher adjusted.
Although one can obtain a finite distance for arbitrary states and by integrating along geodesics, it is important to notice that the operational meaning of the Braunstein and Caves metric is lost in the process.
In the spirit of Braustein and Caves’ physical approach to metrics, we next consider the distinguishability measures and , discussed in Section IV, for infinitesimally close states and derive line elements with the same operational meaning, which we call and respectively. For we also give the volume element and the prior probability distribution, whereas those corresponding to the metric can be easily found in the literature since, as will be shown, is proportional to the widely-studied Bures metric .
Before we start we would like to point out that one could also consider line elements induced by other quantities, such as the quantum relative entropy, which, as we saw above, also has a clear operational interpretation. The quantum relative entropy induces the so-called Kubo-Mori metric Petz (2002), which has the drawback of being singular for pure states.
v.1 Quantum Chernoff metric
For neighboring density matrices and (e.g., those for which their independent matrix elements differ by an infinitesimal amount) the distinguishability measure defines a metric, as in (39). For the quantum Chernoff measure, , this metric can be computed from Eq. (27) Audenaert et al. (2006):
where the dots stand for higher order terms in that will not contribute to and we have also used that . We now recall the integral representation
and its derivative,
These representations hold for and can be straightforwardly extended to positive matrices. In particular, using (41) and the convergent sequence
which also holds for matrices provided , one can write, up to second order in ,
where . Inserting this expansion in (40) one finds
The first term in the integrand vanishes, as can be seen by using (42) and , while the second term can be computed in the eigenbasis of ; :
where in the second equality we have taken into account that , which enabled us to symmetrize the expression in parenthesis that multiplies in the sum (this symmetrization gives the factor ). The quantum Chernoff metric can be finally written as,
The quantum Chernoff metric belongs to the family of contractive quantum metrics, as it should, since by construction the probability of error cannot be improved by a pre-processing of the states. In fact the quantum Chernoff metric coincides with a member of this family that has been explicitly written by Petz in Petz and Sudár (1999) and with the so called Wigner-Yanase metric, which has been recently studied in depth by the authors of Gibilisco and Isola (2003). In particular, the geodesic distance, the geodesic path, and the scalar curvature of the quantum Chernoff metric can be read off from their Eqs. (5.1-5.3).
By separating diagonal from off-diagonal terms, the metric in (47) can also be written as
Next, we wish to identify the degrees of freedom in the off-diagonal terms. We will see that they correspond to infinitesimal unitary transformations acting on (which leave its eigenvalues unchanged). This is most conveniently done by parameterizing by its eigenvalues and eigenvectors, namely by and the components of onto a given canonical basis :
(naturally, it also holds that ). A neighboring density matrix is thus parameterized by and . We further note that , where is antihermitian, . It is actually the infinitesimal generator along the direction in parameter space that takes into . It follows that . The matrix elements of can be expressed as
and those of as
where we have used (49) in going from the first to the second line [the very same matrix elements of can also be written as in the eigenbasis of ]. Substituting these relations back into (48) we obtain
The same expression can also be derived by differentiating
where is diagonal in the canonical basis and has the spectrum of .
Eq. (52) displays the metric in a very suggestive form. Any density matrix can be parameterized by its eigenvalues and the unitary matrix that diagonalizes it. Eq. (52) expresses the infinitesimal distance between two such matrices in terms of these very parameters. The first term is immediately recognized as the (Fisher) metric on the -dimensional simplex of eigenvalues of , which is assumed to be throughout the rest of this section (note that , which implies ). Thus, stricto senso, it should be expressed in terms of a set of independent eigenvalues. If we choose this set to be the first term in (52) becomes