NonAsymptotic Classical Data Compression with Quantum Side Information
Abstract.
In this paper, we analyze classical data compression with quantum side information (also known as the classicalquantum SlepianWolf protocol) in the socalled large and moderate deviation regimes. In the nonasymptotic setting, the protocol involves compressing classical sequences of finite length and decoding them with the assistance of quantum side information. In the large deviation regime, the compression rate is fixed, and we obtain bounds on the error exponent function, which characterizes the minimal probability of error as a function of the rate. Devetak and Winter showed that the asymptotic data compression limit for this protocol is given by a conditional entropy. For any protocol with a rate below this quantity, the probability of error converges to one asymptotically and its speed of convergence is given by the strong converse exponent function. We obtain finite blocklength bounds on this function, and determine exactly its asymptotic value, thus improving on previous results by Tomamichel. In the moderate deviation regime for the compression rate, the latter is no longer considered to be fixed. It is allowed to depend on the blocklength , but assumed to decay slowly to the asymptotic data compression limit. Starting from a rate above this limit, we determine the speed of convergence of the error probability to zero and show that it is given in terms of the conditional information variance. Our results complement earlier results obtained by Tomamichel and Hayashi, in which they analyzed the socalled small deviation regime of this protocol.
 \newshadetheoremconjConjecture
1. Introduction
Source coding (or data compression) is the task of compressing information emitted by a source in a manner such that it can later be decompressed to yield the original information with high probability. The information source is said to be memoryless if there is no correlation between the successive messages emitted by it. In this case, successive uses of the source is modeled by a sequence of independent and identically distributed (i.i.d.) random variables , each taking values in a finite alphabet with probability . Such a source is equivalently modeled by a single random variable with probability mass function , with , and is called a discrete memoryless source (DMS). Let denote the Shannon entropy of . Shannon’s Source Coding Theorem [1] tells us that if the messages emitted by copies of the source are compressed into at least bits, then they can be recovered with arbitrary accuracy upon decompression, in the asymptotic limit ().
One variant of the above task is that of data compression with classical side information (at the decoder), which is also called SlepianWolf coding, first studied by Slepian and Wolf [2]. In this scenario, one considers a memoryless source emitting two messages, and , which can be considered to be the values taken by a pair of correlated random variables . The task is once again to optimally compress sequences emitted on copies of the source so that they can be recovered with vanishing probability of error in the asymptotic limit. However, at the recovery (or decompression) step, the decoder also has access to the corresponding sequence . Since and are correlated, the knowledge of gives information about the sequence , and thus assists in the decoding. Slepian and Wolf showed that as long as the sequences are compressed into bits with , where is the conditional entropy of given , this task can be accomplished with vanishing probability of error in the asymptotic limit [2]. Here, is called the rate of the protocol and is called the coding blocklength. In fact, Slepian and Wolf also considered the case in which is compressed and sent to the decoder at a rate ; the decoder attempts to faithfully decode both and . This gives rise to an achievable rate region of pairs for which this task is possible. In this work, we do not bound the rate ; that is, we consider the decoder to receive an uncompressed version of .
In this setting, Slepian and Wolf showed that the data compression limit, that is, the minimal rate of asymptotically lossless compression, is given by . Moreover, Oohama and Han [3] established that the data compression limit for SlepianWolf coding satisfies the socalled strong converse property. That is, for any attempted compression to bits with , the probability of error converges to in the asymptotic limit. This protocol has been extended to countably infinite alphabets and a class of information sources with memory by Cover [4], and to various other settings [5, 6].
The above characterization of the data compression limit in terms of the conditional entropy is only valid in the asymptotic limit. It tells us that there exist data compression protocols for which infinite sequences of outputs of a DMS can be compressed and later recovered with vanishing probability of error, but it gives no control on the probability of error incurred for any finite sequence. However, in practical implementations of the protocol one is obliged to consider finite sequences. Hence, it is important to determine the behavior of the optimal error probability in the nonasymptotic setting (i.e. finite ). To do so, we consider the socalled reliability function or error exponent function (see [7, 8] and references therein), which gives the exponential rate of decay of the minimal probability of error achievable by a SlepianWolf protocol, at a fixed rate of compression. On the other hand, one can evaluate the minimum compression rate as a function of the coding blocklength, under the constraint that the error probability is below a certain threshold [9, 10, 11].
A quantum generalization of the SlepianWolf protocol, which was first introduced by Devetak and Winter [12] is the task of classical data compression with quantum side information. They referred to this task as the ClassicalQuantum SlepianWolf (CQSW) problem. In this protocol, the correlated pair of random variables is replaced by a classicalquantum (cq) state . Here denotes a quantum system which is in the possession of the decoder (say, Bob) and constitutes the quantum side information (QSI), while is a classical system in the possession of the encoder (say, Alice) and corresponds to a random variable with probability mass function , with , as in the classical setting. Such a cq state is described by an ensemble : with probability the random variable takes the value and Bob’s system is in the state
In this paper we primarily study the CQSW protocol in the nonasymptotic setting in which one no longer takes the limit . This corresponds to the more realistic scenario in which only a finite number of copies of the cq state are available. First, we focus on the socalled large deviation regime
For any protocol with a rate , the probability of error converges to one asymptotically and its speed of convergence is given by the strong converse exponent function. We obtain finite blocklength bounds on this function, and determine exactly its asymptotic value, thus improving on previous results by Tomamichel [19]. This value is given in terms the sandwiched conditional Rényi entropy [20, 21, 22].
The bounds we obtain are expressed in terms of certain entropic exponent functions involving conditional Rényi entropies. To derive these results, we prove and employ properties of these functions. In obtaining the strong converse bounds, we employ variational representations for certain auxiliary exponent functions by making use of those for the socalled logEuclidean Rényi relative entropies developed in [23]. Our variational representations are analogous to those obtained by Csiszár and Körner in the classical setting [24, 25, 26, 8].
We also study the tradeoffs between the rate of compression, the minimal probability of error, and the blocklength . Specifically, we characterize the behaviors of the error probability and the compression rate in the moderate deviation regime. In contrast to the previously discussed results for which the rate was considered to be fixed, here we allow the rate to change with , approaching slowly (slower than ), from above. In this case, we show that the probability of error vanishes asymptotically. In addition, we obtain an asymptotic formula describing the minimum compression rate which converges to when the probability of error decays subexponentially in . We summarize the error behaviors of different regimes in Table 1.
Different Regimes  Concentration Phenomena  SlepianWolf Coding 

Small Deviation  
Moderate Deviation  
Large Deviation  
()  
Large Deviation  
()  
1.1. Prior Works
Renes and Renner [27] analyzed the protocol in the socalled oneshot setting (which corresponds to the case )
for a given threshold (, say) on the probability of error. They proved that in this case the classical
random variable can be compressed to a number of bits
given by a different entropic quantity, the socalled smoothed conditional maxentropy, the smoothing parameter being dependent on . They also established that this entropic quantity gives
the minimal number of bits, up to small additive quantities involving . More precisely, the authors established upper and lower bounds on the minimal number of bits in terms of the smoothed conditional maxentropy. The asymptotic result of Devetak and Winter could be recovered from their results by replacing the cq state by its fold tensor power in these oneshot bounds, dividing by , and taking the limit .
In [28], the authors improved these bounds and established a second order expansion of the socalled minimum code size
where is the quantum conditional information variance, and is the cumulative distribution function of a standard normal distribution.
This paper is organized as follows. We introduce the CQSW protocol in Sec. 2, and state our main results in Sec. 3. The notation and definitions for the entropic quantities and exponent functions are described in Sec. 4. Sec. 5 presents the error exponent analysis for CQSW as (large deviation regime), and we study the optimal success exponent as (strong converse regime) in Sec. 6. In Sec. 7 we discuss the moderate deviation regime. We conclude this paper in Sec. 8 with a discussion.
2. Classical Data Compression with Quantum Side Information (SlepianWolf Coding)
Suppose Alice and Bob share multiple (say ) identical copies of a classicalquantum (cq) state
(1) 
where is a finite alphabet and is a quantum state, of a system with Hilbert space , in Bob’s possession. The letters can be considered to be the values taken by a random variable with probability mass function . One can associate with a quantum system (which we also refer to as ) whose Hilbert space has an orthonormal basis labeled by , i.e. .
The aim of classicalquantum SlepianWolf (CQSW) coding is for Alice to convey sequences to Bob using as few bits as possible; Bob can employ the corresponding quantum state which is in his possession, and plays the role of quantum side information (QSI), to help decode Alice’s compressed message.
Alice’s encoding (compression) map is given by , where the alphabet is such that . If Alice’s message was , the compressed message that Bob receives is . He applies a decoding map on the pair in order to infer Alice’s original message. Thus, Bob’s decoding is given by a map where denotes the set of states on .
If we fix the first argument as , we have that the decoding is a map from which is given by a positive operatorvalued measurement (POVM). Thus, we can represent the decoding by a collection of POVMs , where with and , for each . That is, if Alice sends the message , Bob receives , and measures the state with the POVM . We depict the protocol in Figure 1.
Given and , an encodingdecoding pair of the form described above is said to form an code if (or, more precisely, ). Here, is called the rate of the code . For such a code, the probability of error is given by
(2) 
where for . We can also consider a random encoding which maps to with some probability . In this case, the probability of error is given by
(3) 
Alternatively, we can see the random encoding as applying a deterministic encoding with some probability . Then for a code ,
(4) 
Thus, the error probability for a random encoding is an average of error probabilities of deterministic encodings. In particular, , so the optimal error probability is achieved for a deterministic code.
The optimal (minimal) rate of data compression evaluated in the asymptotic limit (), under the condition that the probability of error vanishes in this limit is called the data compression limit. Devetak and Winter [12] proved that it is given by the conditional entropy of :
(5) 
where denotes the von Neumann entropy of a state .
In this paper, we analyze the SlepianWolf protocol primarily in the nonasymptotic scenario (finite ). The two key quantities that we focus on are the following. The optimal error probability for a rate and blocklength is defined as
(6) 
Similarly, for any , we define the optimal rate of compression at an error threshold and blocklength by
(7) 
In particular, we obtain bounds on the finite blocklength error exponent
(8) 
and the finite blocklength strong converse exponent
(9) 
In terms of , Devetak and Winter’s result can be reformulated as
(10) 
Hence, is called the SlepianWolf limit. We may illustrate this result by Figure 2 below.
3. Main Results
The main contributions of this work consist of a refinement of (10). We derive bounds on the speed of convergence of to zero for any . Further, for we obtain bounds on the strong converse , and determine its exact value in the asymptotic limit. In addition, we analyze the asymptotic behavior of and in the socalled moderate deviations regime. These results are given by the following theorems, in each of which denotes a cq state (eq. 1), with .
Given a rate , there exists a sequence of codes such that the probability of error tends to zero as , as shown by (10). In fact, this convergence occurs exponentially quickly with , and the exponent can be bounded from below and above, as we show in Theorems 3 and 3 respectively.
theolargeach For any rate , and any blocklength , the finite blocklength error exponent defined in (8) satisfies
(11) 
where
(12) 
, and is the Rényi divergence: .
[SpherePacking Bound for SlepianWolf Coding]theospSW Let . Then, there exist , such that for all , the finite blocklength error exponent defined in (8) satisfies
where
(13) 
On the other hand, for , no sequence of codes can achieve vanishing error asymptotically. For this range, we in fact show that the probability of error converges exponentially quickly to one, as shown by the bounds on given in the following theorems.
theoSCconversebound For all the finite blocklength strong converse exponent defined in (9) satisfies
(14) 
where
(15) 
and , with being the sandwiched Rényi divergence [20, 21]. The proof of Theorem 3 is in Section 6.1.
We also obtain an upper bound on , which, together with Theorem 3 shows that is the strong converse exponent in the asymptotic limit. {restatable}theoSCachievbound For all , the finite blocklength strong converse exponent defined in (9) satisfies
(16) 
for and any , where we denote by any term which is bounded by for all large enough, for some constant depending only on , , and . In particular, taking then yields
(17) 
The proof of Theorem 3 is in Section 6.2, along with Proposition 6.1, a more detailed version of the result with the constants written explicitly. Note that, together, Theorems 3 and 3 imply
(18) 
Lastly, we consider the case where the rate depends on as , where is a moderate sequence, that is, a sequence of real numbers satisfying
(19) 
In this case, we have the following asymptotic result.
theomodlarge Assume that the cq state has strictly positive conditional information variance , where
(20) 
with . Then for any sequence satisfying Eq. (19),
(21) 
for .
4. Preliminaries and Notation
Throughout this paper, we consider a finitedimensional Hilbert space . The set of density operators (i.e. positive semidefinite operators with unit trace) on is defined as . The quantum systems, denoted by capital letter (e.g. , ), are modeled by finitedimensional Hilbert spaces (e.g. ); copies of a system is denoted by , and is modeled by the fold tensor product of the Hilbert spaces, . For , we denote by if the support of is contained in the support of . The identity operator on is denoted by . The subscript will be removed if no confusion is possible. We use as the standard trace function. For a bipartite state , denotes the partial trace with respect to system . We denote by . The indicator function is defined as follows: if the event is true; otherwise .
For a positive semidefinite operator whose spectral decomposition is , where and are the eigenvalues and eigenprojections of , its power is defined as: . In particular, denotes the projection onto , where we use to denote the support of the operator . Further, means . Additionally, we define the pinching map with respect to by . The and are performed on base throughout this paper.
4.1. Entropic Quantities
For any pair of density operators and , we define the quantum relative entropy, Petz’s quantum Rényi divergence [29], sandwiched Rényi divergence [20, 21], and the logEuclidean Rényi divergence [30, 23], respectively, as follows:
(23)  
(24)  
(25)  
(26) 
We define the quantum relative entropy variance [28, 31] by
(27) 
The above quantity is nonnegative. Further, it follows that
(28) 
For , and , or , the quantum conditional Rényi entropies are given by
(29) 
When and , or in Eq. (29), both quantities coincide with the usual quantum conditional entropy:
(30) 
where denotes the von Neumann entropy. Similarly, for , we define the conditional information variance:
(31) 
It is not hard to verify from Eq. (28) that
(32) 
Lemma 4.1 ([32], [23, Lemma III.3, Lemma III.11, Theorem III.14, Corollary III.25], [33, Corollary 2.2]).
Let . Then,
(33)  
(34) 
Moreover
(35)  
(36) 
Proposition 4.2 (Properties of Rényi Conditional Entropy).
Given any classicalquantum state , the following holds:
 [(a)]

The map is continuous and monotonically decreasing on .

The map is concave on .
The proof is provided in Appendix A.
Given two states and , one can define an associated binary hypothesis testing problem of determining which of the two states was given via a binary POVM. Such a POVM is described by an operator (associated, say, with the outcome ) such that , called the test. Two types of errors are possible; the probability of measuring and reporting the outcome is given by and called the typeI error, while the probability of measuring and reporting the outcome is given by and is called the typeII error. The hypothesis testing relative entropy (e.g. as defined in [34]) is defined by
(37) 
and characterizes the minimum typeII error incurred via a test which has typeI error at most . The hypothesis testing relative entropy satisfies the dataprocessing inequality
(38) 
for any completely positive map [34]. This quantity has an interpretation as a relative entropy as it satisfies the following asymptotic equipartition property:
(39) 
We can consider a related quantity, which denotes the minimum typeI error such that the typeII error does not exceed . That is,
(40) 
By (38), for any completely positive map we have
(41) 
We also consider the socalled maxrelative entropy, given by
(42) 
In establishing the exact strong converse exponent, we will employ a smoothed variant of this quantity,
(43) 
where
(44) 
is the ball in the distance of optimal purifications, , defined by
(45) 
where the minimum is over purifications of and of . By equation (2) of [37], the distance satisfies
(46) 
It was shown in [37] that satisfies an asymptotic equipartition property. In fact, Theorem 14 of [37] gives finite upper and lower bounds on which converge to . We will only need the upper bound, namely
(47) 
where .
The smoothed maxrelative entropy satisfies the following simple but useful relation (which is a oneshot analog of Lemma V.7 of [23]).
Lemma 4.3.
For , if then
(48) 