Second-order asymptotics for source coding, dense coding and pure-state entanglement conversions
We introduce two variants of the information spectrum relative entropy defined by Tomamichel and Hayashi [TH13] which have the particular advantage of satisfying the data-processing inequality, i.e. monotonicity under quantum operations. This property allows us to obtain one-shot bounds for various information-processing tasksin terms of these quantities. Moreover, these relative entropies have a second order asymptotic expansion, which in turn yields tight second order asymptotics for optimal rates of these tasks in the i.i.d. setting. The tasks studied in this paper are fixed-length quantum source coding, noisy dense coding, entanglement concentration, pure-state entanglement dilution, and transmission of information through a classical-quantum channel. In the latter case, we retrieve the second order asymptotics obtained by Tomamichel and Tan [TT13]. Our results also yield the known second order asymptotics of fixed-length classical source coding derived by Hayashi [Hay08]. The second order asymptotics of entanglement concentration and dilution provide a refinement of the inefficiency of these protocols - a quantity which, in the case of entanglement dilution, was studied by Harrow and Lo [HL04]. We prove how the discrepancy between the optimal rates of these two processes in the second order implies the irreversibility of entanglement concentration established by Kumagai and Hayashi [KH13]. In addition, the spectral divergence rates of the Information Spectrum Approach (ISA) can be retrieved from our relative entropies in the asymptotic limit. This enables us to directly obtain the more general results of the ISA from our one-shot bounds.
Optimal rates of information-processing tasks such as storage and transmission of information, and manipulation of entanglement, are of fundamental importance in Information Theory. These rates were originally evaluated in the so-called asymptotic, i.i.d.111Here, i.i.d. is the standard acronym for “independent and identically distributed”. setting, in which it is assumed that the underlying resources (sources, channels or entangled states) employed in the tasks are available for asymptotically many uses, and that there are no correlations between their successive uses. The rates in this scenario are given in terms of entropic quantities obtainable from the relative entropy. It is, however, unrealistic to assume the availability of infinitely many copies of the required resources. In practice, we have finite resources and hence a fundamental problem of both theoretical and practical interest is to determine how quickly the behaviour of a finite system approaches that of its asymptotic limit.
The first step in this direction, from the standpoint of Information Theory, is to determine the second order asymptotics of optimal rates.222The precise meaning of the phrase “second order asymptotics” is elucidated in the following paragraph. Interest in this was initiated in the classical realm by Strassen [Str62], who evaluated the second order asymptotics for hypothesis testing and channel coding. In the last decade there has been a renewal of interest in the evaluation of second order asymptotics for other classical information theoretic tasks (see e.g. [Hay08, Hay09, KH13a] and references therein) and, more recently, even in third-order asymptotics [KV13a]. Moreover, the recent papers by Tomamichel and Hayashi [TH13] and Li [Li14] have introduced the study of second order asymptotics in Quantum Information Theory as well. The achievability parts of the second order asymptotics for the tasks studied in [TH13, Li14] were later also obtained by Beigi and Gohari [BG13] via the collision relative entropy.
Let us explain what exactly is meant by the phrase “second order asymptotics”. Consider the familiar task of fixed-length quantum source coding. Schumacher [Sch95] proved that for a memoryless, quantum information source characterized by a state , the optimal rate of reliable data compression is given by its von Neumann entropy, . The criterion of reliability is that the error incurred in the compression-decompression scheme vanishes in the asymptotic limit, , where denotes the number of copies (or uses) of the source. Let us now consider instead a finite number () of copies of the source and require that the error incurred in compressing its state is at most (for some ). Suppose denotes the compression length, i.e. the minimum of the logarithm of the dimension of the compressed Hilbert space in this case. This quantity can be expanded as follows:
Here, the coefficient of the leading term constitutes the first order asymptotics of the compression length. As expected, it turns out to be the the optimal rate in the asymptotic limit, i.e. the von Neumann entropy of the source. The second order asymptotics is given by the coefficient , and is a function of both the error threshold and the state . Determining the second order asymptotics comprises the evaluation of the coefficient .
Theorem 5.8(i) of Section 5 gives an explicit expression for the coefficient in the case of fixed-length (visible) quantum source coding. In Figure 1 we plot against for a memoryless quantum information source which emits the (mutually non-orthogonal) pure states and with equal probability, for three different values of the error threshold . This exhibits how the rate of data compression converges to its asymptotically optimal value.
In this paper we study the second order asymptotics of various information-processing tasks: fixed-length quantum source coding, noisy dense coding, entanglement concentration, pure-state entanglement dilution, and transmission of information through a classical-quantum channel. For each task we consider the -copy case, where denotes the number of copies of the relevant resource (source, entangled state or channel) employed in the protocol, denotes the error threshold, and denotes the characteristic quantity of the protocol (e.g. the minimum compression length in the case of source coding). We arrive at an expression of the form (1) for – which in this case is a quantity from which the optimal asymptotic rate of the protocol is obtained through the relation
For each of the tasks studied, the coefficient turns out to be the entropic quantity characterizing . Moreover, the coefficient is proportional to , where denotes the inverse of the cumulative distribution function of the standard normal distribution, and the constant of proportionality depends on the underlying resource (the latter is often called the dispersion or information variance). This form of is a feature of all results on second order asymptotics and stems from the Berry-Esseen theorem [Fel71] – a refinement of the central limit theorem which takes into account the rate of convergence of a distribution to a standard normal distribution.
Two mathematical quantities play key roles in the derivation of our results. These are variants of the information spectrum relative entropy defined by Tomamichel and Hayashi [TH13], but have the particular advantage of satisfying the data-processing inequality. For a state and a positive semi-definite operator , we denote them as and (where ) and refer to them simply as information spectrum relative entropies.333 To avoid confusion, we denote the information spectrum relative entropy defined in [TH13] by . These notations and nomenclatures stem from the fact that for any arbitrary sequence of states and positive semi-definite operators such that and are defined on the Hilbert space for each , the following relations hold:
where and are the inf- and sup- spectral divergence rates of the so-called quantum Information Spectrum Approach (ISA) (see Definition 4.11). The ISA provides a unifying mathematical framework for obtaining asymptotic rate formulae for various different tasks in information theory. The power of the approach lies in the fact that it does not rely on any particular structure or property of the resources used in the tasks. It was introduced by Han and Verdu [HV93] in classical Information Theory, and generalized to the quantum setting by Hayashi, Nagaoka and Ogawa [Hay06, HN03, NH07, ON00].
The information spectrum relative entropies, and , can also be related to other relative entropies which arise in one-shot information theory (see [Ren05, KRS09, Dat09, Tom12, DH11, MW12] and references therein), e.g. the hypothesis testing relative entropy [WR12], and the smooth max-relative entropy [Dat09]. In fact, one can prove that all these relative entropies are equivalent in the sense that upper and lower bounds to any one of them can be obtained in terms of any one of the others, modulo terms which depend only on the (smoothing) parameter . These equivalences prove very useful. In particular, the bounds on the information spectrum relative entropy in terms of directly yield second order asymptotic expansions for and via the second order asymptotic expansion for , which has been derived by Tomamichel and Hayashi in [TH13].
In addition, as in the case of the usual relative entropy, one can derive other entropic quantities, namely, entropy, conditional entropy and mutual information, from the information spectrum relative entropies. For each of the information-processing tasks considered in this paper, we obtain one-shot bounds in terms of quantities derived from the information spectrum relative entropies. The second order expansions of these quantities then directly yield tight second order asymptotics for optimal rates of the corresponding tasks in the i.i.d. setting.
Finally, the relations (2) enable us to directly obtain the more general results of the ISA, for each of the tasks considered, from our one-shot bounds. For example, our bounds for one-shot fixed-length source coding yields the optimal data compression limit for a general (i.e. not necessarily memoryless) quantum information source.
2 Overview of results
Here we summarize our main contributions and give pointers to the relevant theorems.
We prove the data-processing inequalities and other properties of the information spectrum relative entropies in Proposition 4.3. We also show in Proposition 4.7 that the information spectrum relative entropies are equivalent to the hypothesis testing relative entropy and the smooth max-relative entropy, in the sense that these quantities are bounded by each other.
Using their equivalences with the hypothesis testing relative entropy, and the second order asymptotic expansion for the latter [TH13], we obtain second order asymptotic expansions for the information spectrum relative entropies in Proposition 4.9.
In Proposition 4.12 we prove that the information spectrum relative entropies reduce to the spectral divergence rates of the ISA in the asymptotic limit, when the parameter is taken to zero.
We obtain one-shot bounds for the following information-processing tasks in terms of quantities derived from the information spectrum relative entropies. In each case the parameter plays the role of the error threshold allowed in the protocol.
In particular, Theorem 5.8(i) gives the second order expansion for fixed-length visible quantum source coding. We also obtain asymptotic upper and lower bounds on the minimal compression length in the blind setting, stated in Theorem 5.8(ii). The distinction between visible and the blind source coding is elaborated in Section 5.1.1.
Even though the leading order terms for the optimal rates for entanglement concentration and dilution are identical (and given by the entropy of entanglement), there is a difference in their second order terms. Explicit evaluation of these terms lead to a refinement of the inefficiency of these protocols. In the case of entanglement dilution, the latter quantity (studied by Harrow and Lo [HL04]) was introduced as a measure of the amount of entanglement wasted (or lost) in the dilution process. More precisely, in [HL04] it was proved that the number of ebits needed to create copies of a desired bipartite pure state with entropy of entanglement was of the form . We prove that the number of ebits can, in fact, be expressed in the form , and we evaluate the coefficient explicitly.
We also show how the irreversibility of entanglement concentration, established by Kumagai and Hayashi [KH13], can be proved using the discrepancy between the asymptotic expansions for distillable entanglement and entanglement cost in the second order ().
Finally, in Proposition 6.2, we recover the known expressions for optimal rates in the case of arbitrary resources as obtained by the Information Spectrum Approach.
The paper is organized as follows. In the next section, we introduce necessary notation and definitions. The rest of the paper proceeds in the order of the results mentioned above. We end with a conclusion that summarizes our results and points to open questions for future research.
3 Mathematical preliminaries
For a Hilbert space , let denote the algebra of linear operators acting on , and let denote the set of positive semi-definite operators on . Further, let and denote the set of states (density matrices) and subnormalized states on respectively. For a state , the von Neumann entropy is defined as . Here and henceforth, all logarithms are taken to base . Unless otherwise stated, we assume all Hilbert spaces to be finite-dimensional. Let denote the identity operator on , and the identity map on operators on . For a pure state , we denote the corresponding projector by . For a completely positive, trace-preserving (CPTP) map , we also use the shorthand notation .
For self-adjoint operators , let denote the spectral projection of the difference operator corresponding to the interval ; the spectral projections , and are defined in a similar way. We also define where for any self-adjoint operator . The following lemmas are used in our proofs:
[ON00] Let and be an arbitrary operator, then
and the same assertion holds for .
[BD06] Let and be a CPTP map. Then
[BD06] Let and . Then for any ,
For the convenience of the reader, we recall the definitions of the most important distance measures and state their relations [TCR10]:
Let be subnormalized states, then:
The generalized fidelity of and is defined by
The purified distance is defined by
and constitutes a metric on , i.e. it satisfies the triangle inequality.
The generalized trace distance is defined by
and constitutes a metric on .
If at least one of the subnormalized states and is normalized, the generalized fidelity reduces to the usual fidelity, i.e. If both and are normalized, the generalized trace distance also reduces to the usual trace distance,
[Tom12] For we have the following bounds:
The following entropic quantities play a key role in second order asymptotic expansions:
For and , the quantum relative entropy is defined as
The quantum information variance is defined as
and we set
The inverse of the cumulative distribution function (cdf) of a standard normal random variable is defined by
where . We frequently make use of the following lemma:
Let , then
for some with .
We make the following general observation: Let be a continuously differentiable function. Then by Taylor’s theorem we can write
for some in the case ‘’ and in the case ‘’. Applying (4) to yields the claim. ∎
4 Information spectrum relative entropies
4.1 Definitions and mathematical properties
The central quantities of this paper are the information spectrum relative entropies and , which we now define.
For , and , the information spectrum relative entropies are defined as
These relative entropies are one-shot generalizations of the spectral divergences used in the ISA to quantum information theory. In Section 4.4 these generalizations are discussed in detail.
Note furthermore, that the information spectrum relative entropies as defined in Definition 4.1 are variants of the definition introduced by Tomamichel and Hayashi [TH13]:
Our definitions have the advantage of satisfying the data processing inequality, as shown in Proposition 4.3(iii). First, we prove the following lemma which implies that the supremum and infimum in Definition 4.1 are attained.
For let be a one-parameter family of Hermitian operators such that is continuous. Then the function is continuous.
For , the function is continuous and monotonically decreasing, and strictly so if .
The supremum and infimum in Definition 4.1 are attained, and unique if .
(i) First, we observe that the trace of the positive part of a Hermitian operator is the sum of its non-negative eigenvalues, the eigenvalues being the roots of the characteristic polynomial. Since the determinant is a continuous function with respect to the operator norm on , the coefficients of the characteristic polynomial of continuously depend on , and by factorizing the polynomial into linear factors we see that this carries over to the roots. Composing the sum over the roots with the continuous maximum function , we obtain that the function is a composition of continuous functions and thus continuous itself.
(ii) Since is continuous with respect to the operator norm, the function is continuous by (i). To prove that is decreasing, let and observe that
where . Hence, Weyl’s Monotonicity Theorem (see e.g. Section III in [Bha13]) implies that
where for a Hermitian operator we write to denote the -th largest eigenvalue of . It then follows from (6) that .
To prove strict monotonicity, let . Without loss of generality, we restrict to the support of (which is possible due to the assumption ), such that has strictly positive eigenvalues. Consequently, , and the inequality in (6) is a strict one for all , from which we obtain that .
(iii) Since and , we infer by (ii) and the Intermediate Value Theorem that the supremum in the definition of as well as the infimum in the definition of are attained. If , then they are moreover unique by the strict monotonicity of . ∎
We are now ready to record a series of properties of the information spectrum relative entropies:
Let , and . Then the following properties hold:
Data processing inequality: For any CPTP map , we have
Monotonicity in : Let , then
Let with , then
Let , then .
Let and with , then
(i) Let . Then by the definition of we have
since by assumption. Therefore, is feasible for and consequently, from (5) it follows that
which yields the claim.
(ii) Let . Then by the definition of , we have
Hence, is feasible for , and we obtain
Assume now that
for some , i.e. . By the monotonicity of , we have
On the other hand, by definition of . This leads to a contradiction, yielding .
(iii) Let . Then Lemma 3.2 implies that
Hence, is feasible for , and we obtain
Similarly, let . Then
by Lemma 3.2. Hence, is feasible for , and we obtain
(iv) Let , then
Hence, is feasible for , and consequently, .
Similarly, let . Then
and hence, is feasible for , and we obtain .
(v) Let and , then we compute:
where the first inequality follows from and the second inequality follows from Lemma 3.1. Hence, is feasible for and we obtain .
(vi) Let for , then
Conversely, let . Then
which proves the claim.
(vii) Let and . Then
where the first inequality follows from Lemma 3.1 and the second inequality follows from the fact [Tom12] that
by assumption. This proves the claim. ∎
Proposition 4.3(ii) shows that we only need to focus on one of the information spectrum relative entropies (we choose without loss of generality). However, given their close relationship to the quantum spectral inf- and sup divergence (Section 4.4), we note that it is useful to keep the two alternative definitions in Definition 4.1.
Children entropies of the information spectrum relative entropies
The quantum relative entropy acts as a parent quantity for other entropic quantities:
the von Neumann entropy
the quantum conditional entropy
the quantum mutual information
This motivates us to define the following information spectrum entropies:
Let and be states. Then we define:
the information spectrum entropies
the information spectrum conditional entropies
the information spectrum mutual informations
Note that in Definition 4.4, (i) and (ii) the occurrence of the minus sign is the reason for changing the upper bar to a lower bar and vice versa. In Section 5 these quantities arise in one-shot bounds for operational tasks.
The information spectrum conditional entropies satisfy the following interesting property under local operations and classical communication (LOCC) taking pure states to pure states:
Let where denotes any LOCC operation which takes pure states to pure states, and is a bipartite pure state. Then the following inequality holds:
By definition, we have and furthermore,
However, by a result of Lo and Popescu [LP01], the action of the LOCC map on the pure state can be expressed as follows:
where the are unitary operators and are operators such that . Consequently, using the cyclicity of the trace we obtain
Further, for , we have
which can be seen as follows:
where the second identity follows from the unitarity of the operators , and the last identity follows from (8). Hence,
where the inequality follows from Proposition 4.3, (iii). ∎
4.2 Relation to other relative entropies
In this section we prove that the information spectrum relative entropies are equivalent to the hypothesis testing relative entropies and the smooth max-relative entropy, which arise in one-shot information theory. This equivalence is in the sense that upper and lower bounds to any one of them can be obtained in terms of any one of the others, modulo terms which depend only on the (smoothing) parameter . Let us first recall the definitions of the hypothesis testing relative entropies and the smooth max-relative entropy:
For and , the hypothesis testing relative entropy is defined as
For , and , the smooth max-relative entropy is defined as
where is the -ball with respect to the purified distance .
We prove the following relations between these relative entropies:
Let , , and with . Then we obtain the following bounds:
(i) To prove the upper bound, let where for some . Since
we see that is feasible for . Hence,
since . By the definition of , we obtain
and hence the upper bound.
Conversely, let be optimal for and set . Then
where the first inequality follows from Lemma 3.1. Hence, is feasible for and we obtain
(ii) We start with the upper bound. Let for some arbitrary and set . We have by definition of , and hence,
Thus, is feasible for . Furthermore,
Conversely, assume that and let be optimal for , such that
For arbitrary we have