1 Introduction

# Non-Asymptotic Classical Data Compression with Quantum Side Information

## Abstract.

In this paper, we analyze classical data compression with quantum side information (also known as the classical-quantum Slepian-Wolf protocol) in the so-called large and moderate deviation regimes. In the non-asymptotic setting, the protocol involves compressing classical sequences of finite length and decoding them with the assistance of quantum side information. In the large deviation regime, the compression rate is fixed, and we obtain bounds on the error exponent function, which characterizes the minimal probability of error as a function of the rate. Devetak and Winter showed that the asymptotic data compression limit for this protocol is given by a conditional entropy. For any protocol with a rate below this quantity, the probability of error converges to one asymptotically and its speed of convergence is given by the strong converse exponent function. We obtain finite blocklength bounds on this function, and determine exactly its asymptotic value, thus improving on previous results by Tomamichel. In the moderate deviation regime for the compression rate, the latter is no longer considered to be fixed. It is allowed to depend on the blocklength , but assumed to decay slowly to the asymptotic data compression limit. Starting from a rate above this limit, we determine the speed of convergence of the error probability to zero and show that it is given in terms of the conditional information variance. Our results complement earlier results obtained by Tomamichel and Hayashi, in which they analyzed the so-called small deviation regime of this protocol.

\bXLineStyle

## 1. Introduction

Source coding (or data compression) is the task of compressing information emitted by a source in a manner such that it can later be decompressed to yield the original information with high probability. The information source is said to be memoryless if there is no correlation between the successive messages emitted by it. In this case, successive uses of the source is modeled by a sequence of independent and identically distributed (i.i.d.) random variables , each taking values in a finite alphabet with probability . Such a source is equivalently modeled by a single random variable with probability mass function , with , and is called a discrete memoryless source (DMS). Let denote the Shannon entropy of . Shannon’s Source Coding Theorem [1] tells us that if the messages emitted by copies of the source are compressed into at least bits, then they can be recovered with arbitrary accuracy upon decompression, in the asymptotic limit ().

One variant of the above task is that of data compression with classical side information (at the decoder), which is also called Slepian-Wolf coding, first studied by Slepian and Wolf [2]. In this scenario, one considers a memoryless source emitting two messages, and , which can be considered to be the values taken by a pair of correlated random variables . The task is once again to optimally compress sequences emitted on copies of the source so that they can be recovered with vanishing probability of error in the asymptotic limit. However, at the recovery (or decompression) step, the decoder also has access to the corresponding sequence . Since and are correlated, the knowledge of gives information about the sequence , and thus assists in the decoding. Slepian and Wolf showed that as long as the sequences are compressed into bits with , where is the conditional entropy of given , this task can be accomplished with vanishing probability of error in the asymptotic limit [2]. Here, is called the rate of the protocol and is called the coding blocklength. In fact, Slepian and Wolf also considered the case in which is compressed and sent to the decoder at a rate ; the decoder attempts to faithfully decode both and . This gives rise to an achievable rate region of pairs for which this task is possible. In this work, we do not bound the rate ; that is, we consider the decoder to receive an uncompressed version of .

In this setting, Slepian and Wolf showed that the data compression limit, that is, the minimal rate of asymptotically lossless compression, is given by . Moreover, Oohama and Han [3] established that the data compression limit for Slepian-Wolf coding satisfies the so-called strong converse property. That is, for any attempted compression to bits with , the probability of error converges to in the asymptotic limit. This protocol has been extended to countably infinite alphabets and a class of information sources with memory by Cover [4], and to various other settings [5, 6].

The above characterization of the data compression limit in terms of the conditional entropy is only valid in the asymptotic limit. It tells us that there exist data compression protocols for which infinite sequences of outputs of a DMS can be compressed and later recovered with vanishing probability of error, but it gives no control on the probability of error incurred for any finite sequence. However, in practical implementations of the protocol one is obliged to consider finite sequences. Hence, it is important to determine the behavior of the optimal error probability in the non-asymptotic setting (i.e. finite ). To do so, we consider the so-called reliability function or error exponent function (see  [7, 8] and references therein), which gives the exponential rate of decay of the minimal probability of error achievable by a Slepian-Wolf protocol, at a fixed rate of compression. On the other hand, one can evaluate the minimum compression rate as a function of the coding blocklength, under the constraint that the error probability is below a certain threshold [9, 10, 11].

A quantum generalization of the Slepian-Wolf protocol, which was first introduced by Devetak and Winter [12] is the task of classical data compression with quantum side information. They referred to this task as the Classical-Quantum Slepian-Wolf (CQSW) problem. In this protocol, the correlated pair of random variables is replaced by a classical-quantum (c-q) state . Here denotes a quantum system which is in the possession of the decoder (say, Bob) and constitutes the quantum side information (QSI), while is a classical system in the possession of the encoder (say, Alice) and corresponds to a random variable with probability mass function , with , as in the classical setting. Such a c-q state is described by an ensemble : with probability the random variable takes the value and Bob’s system is in the state 1. In the so-called asymptotic, memoryless setting of CQSW, one considers Alice and Bob to share a large number, , of identical copies of the c-q state . Consequently, Alice knows the sequence , whereas the quantum state (i.e. the QSI) is accessible only to Bob. However, Bob has no knowledge of the sequence . The aim is for Alice to convey the sequence to Bob using as few bits as possible. Bob can make use of the QSI in order to help him decode the compressed message sent by Alice. Devetak and Winter proved that the data compression limit of CQSW, evaluated in the asymptotic limit (), is given by the conditional entropy of the c-q state .

In this paper we primarily study the CQSW protocol in the non-asymptotic setting in which one no longer takes the limit . This corresponds to the more realistic scenario in which only a finite number of copies of the c-q state are available. First, we focus on the so-called large deviation regime2, in which the compression rate is fixed, and we analyze the optimal probability of error as a function of blocklength . Specifically, in the range , we obtain upper and lower bounds on the error exponent function. The lower bound shows that for any the CQSW task can be accomplished with a probability of error which decays to zero exponentially in . The upper bound puts a limit on how quickly the probability of error can decay. We term this upper bound the “sphere-packing bound” for CQSW, since it is analogous to the so-called sphere-packing bound obtained in c-q channel coding [16, 17, 18].

For any protocol with a rate , the probability of error converges to one asymptotically and its speed of convergence is given by the strong converse exponent function. We obtain finite blocklength bounds on this function, and determine exactly its asymptotic value, thus improving on previous results by Tomamichel [19]. This value is given in terms the sandwiched conditional Rényi entropy [20, 21, 22].

The bounds we obtain are expressed in terms of certain entropic exponent functions involving conditional Rényi entropies. To derive these results, we prove and employ properties of these functions. In obtaining the strong converse bounds, we employ variational representations for certain auxiliary exponent functions by making use of those for the so-called log-Euclidean Rényi relative entropies developed in [23]. Our variational representations are analogous to those obtained by Csiszár and Körner in the classical setting [24, 25, 26, 8].

We also study the trade-offs between the rate of compression, the minimal probability of error, and the blocklength . Specifically, we characterize the behaviors of the error probability and the compression rate in the moderate deviation regime. In contrast to the previously discussed results for which the rate was considered to be fixed, here we allow the rate to change with , approaching slowly (slower than ), from above. In this case, we show that the probability of error vanishes asymptotically. In addition, we obtain an asymptotic formula describing the minimum compression rate which converges to when the probability of error decays sub-exponentially in . We summarize the error behaviors of different regimes in Table 1.

### 1.1. Prior Works

Renes and Renner [27] analyzed the protocol in the so-called one-shot setting (which corresponds to the case ) for a given threshold (, say) on the probability of error. They proved that in this case the classical random variable can be compressed to a number of bits given by a different entropic quantity, the so-called smoothed conditional max-entropy, the smoothing parameter being dependent on . They also established that this entropic quantity gives the minimal number of bits, up to small additive quantities involving . More precisely, the authors established upper and lower bounds on the minimal number of bits in terms of the smoothed conditional max-entropy. The asymptotic result of Devetak and Winter could be recovered from their results by replacing the c-q state by its -fold tensor power in these one-shot bounds, dividing by , and taking the limit . In [28], the authors improved these bounds and established a second order expansion of the so-called minimum code size3 given an error :

 m∗(n,ε)=nH(X|B)ρ−√nV(X|B)ρΦ−1(ε)+O(logn),

where is the quantum conditional information variance, and is the cumulative distribution function of a standard normal distribution.

This paper is organized as follows. We introduce the CQSW protocol in Sec. 2, and state our main results in Sec. 3. The notation and definitions for the entropic quantities and exponent functions are described in Sec. 4. Sec. 5 presents the error exponent analysis for CQSW as (large deviation regime), and we study the optimal success exponent as (strong converse regime) in Sec. 6. In Sec. 7 we discuss the moderate deviation regime. We conclude this paper in Sec. 8 with a discussion.

## 2. Classical Data Compression with Quantum Side Information (Slepian-Wolf Coding)

Suppose Alice and Bob share multiple (say ) identical copies of a classical-quantum (c-q) state

 ρXB=∑x∈Xp(x)|x⟩⟨x|⊗ρxB (1)

where is a finite alphabet and is a quantum state, of a system with Hilbert space , in Bob’s possession. The letters can be considered to be the values taken by a random variable with probability mass function . One can associate with a quantum system (which we also refer to as ) whose Hilbert space has an orthonormal basis labeled by , i.e. .

The aim of classical-quantum Slepian-Wolf (CQSW) coding is for Alice to convey sequences to Bob using as few bits as possible; Bob can employ the corresponding quantum state which is in his possession, and plays the role of quantum side information (QSI), to help decode Alice’s compressed message.

Alice’s encoding (compression) map is given by , where the alphabet is such that . If Alice’s message was , the compressed message that Bob receives is . He applies a decoding map on the pair in order to infer Alice’s original message. Thus, Bob’s decoding is given by a map where denotes the set of states on .

If we fix the first argument as , we have that the decoding is a map from which is given by a positive operator-valued measurement (POVM). Thus, we can represent the decoding by a collection of POVMs , where with and , for each . That is, if Alice sends the message , Bob receives , and measures the state with the POVM . We depict the protocol in Figure 1.

Given and , an encoding-decoding pair of the form described above is said to form an -code if (or, more precisely, ). Here, is called the rate of the code . For such a code, the probability of error is given by

 Pe(C)≡Pe(ρXB,C)=1−∑x∈Xnp(x)Tr[Π(E(x))xρxBn], (2)

where for . We can also consider a random encoding which maps to with some probability . In this case, the probability of error is given by

 Pe(C)=1−∑x∈Xn,w∈WP(w|x)p(x)Tr[Π(w)xρxBn]. (3)

Alternatively, we can see the random encoding as applying a deterministic encoding with some probability . Then for a code ,

 Pe((E,D))=1−∑jQj∑x∈XnP(w|x)p(x)Tr[Π(Ej)xρxBn]=∑jQjPe((Ej,D)). (4)

Thus, the error probability for a random encoding is an average of error probabilities of deterministic encodings. In particular, , so the optimal error probability is achieved for a deterministic code.

The optimal (minimal) rate of data compression evaluated in the asymptotic limit (), under the condition that the probability of error vanishes in this limit is called the data compression limit. Devetak and Winter [12] proved that it is given by the conditional entropy of :

 H(X|B)ρ=H(ρXB)−H(ρB) (5)

where denotes the von Neumann entropy of a state .

In this paper, we analyze the Slepian-Wolf protocol primarily in the non-asymptotic scenario (finite ). The two key quantities that we focus on are the following. The optimal error probability for a rate and blocklength is defined as

 P∗e(n,R)≡P∗e(ρXB,n,R):=inf{Pe(C):C is an (n,R)-code for ρXB}. (6)

Similarly, for any , we define the optimal rate of compression at an error threshold and blocklength by

 R∗(n,ε):=inf{R:∃ an (n,R)-code C with Pe(C)≤ε}. (7)

In particular, we obtain bounds on the finite blocklength error exponent

 e(n,R):=−1nlogP∗e(n,R) (8)

and the finite blocklength strong converse exponent

 sc(n,R):=−1nlog(1−P∗e(n,R)). (9)

In terms of , Devetak and Winter’s result can be reformulated as

 ∀R>H(X|B)ρ:limsupn→∞P∗e(n,R)=0,∀R0. (10)

Hence, is called the Slepian-Wolf limit. We may illustrate this result by Figure 2 below.

## 3. Main Results

The main contributions of this work consist of a refinement of (10). We derive bounds on the speed of convergence of to zero for any . Further, for we obtain bounds on the strong converse , and determine its exact value in the asymptotic limit. In addition, we analyze the asymptotic behavior of and in the so-called moderate deviations regime. These results are given by the following theorems, in each of which denotes a c-q state (eq. 1), with .

Given a rate , there exists a sequence of codes such that the probability of error tends to zero as , as shown by (10). In fact, this convergence occurs exponentially quickly with , and the exponent can be bounded from below and above, as we show in Theorems 3 and 3 respectively.

{restatable}

theolargeach For any rate , and any blocklength , the finite blocklength error exponent defined in (8) satisfies

 e(n,R)≥E↓r(R)−2n, (11)

where

 E↓r(R)≡E↓r(ρXB,R):=sup12≤α≤11−αα(R−H↓2−1α(X|B)ρ), (12)

, and is the -Rényi divergence: .

The proof of Theorem 3 is in Section 5.1.

{restatable}

[Sphere-Packing Bound for Slepian-Wolf Coding]theospSW Let . Then, there exist , such that for all , the finite blocklength error exponent defined in (8) satisfies

 e(n,R)≤Esp(R)+12(1+∣∣∣∂Esp(r)∂r∣∣r=R∣∣∣)lognn+Kn,

where

 Esp(R)≡Esp(ρXB,R):=sup0≤α≤11−αα(R−H↑α(X|B)ρ), (13)

and . The proof of Theorem 3 is in Section 5.2.

On the other hand, for , no sequence of codes can achieve vanishing error asymptotically. For this range, we in fact show that the probability of error converges exponentially quickly to one, as shown by the bounds on given in the following theorems.

{restatable}

theoSCconversebound For all the finite blocklength strong converse exponent defined in (9) satisfies

 sc(n,R)≥E∗sc(R)>0, (14)

where

 E∗sc(R)≡E∗sc(ρXB,R):=supα>11−αα(R−H∗,↑α(X|B)ρ), (15)

and , with being the sandwiched Rényi divergence [20, 21]. The proof of Theorem 3 is in Section 6.1.

We also obtain an upper bound on , which, together with Theorem 3 shows that is the strong converse exponent in the asymptotic limit. {restatable}theoSCachievbound For all , the finite blocklength strong converse exponent defined in (9) satisfies

 sc(n,R)≤E∗sc(R)+cmlog(m+1)+Om(1√n) (16)

for and any , where we denote by any term which is bounded by for all large enough, for some constant depending only on , , and . In particular, taking then yields

 limsupn→∞sc(n,R)≤E∗sc(R). (17)

The proof of Theorem 3 is in Section 6.2, along with Proposition 6.1, a more detailed version of the result with the constants written explicitly. Note that, together, Theorems 3 and 3 imply

 limn→∞sc(n,R)=E∗sc(R). (18)

Lastly, we consider the case where the rate depends on as , where is a moderate sequence, that is, a sequence of real numbers satisfying

 (i)an→0,asn→∞,(ii)an√n→∞,asn→∞. (19)

In this case, we have the following asymptotic result.

{restatable}

theomodlarge Assume that the c-q state has strictly positive conditional information variance , where

 V(X|B)ρ:=V(ρXB∥\mathds1X⊗ρB) (20)

with . Then for any sequence satisfying Eq. (19),

 limn→∞1na2nlogP∗e(n,Rn)=−12V(X|B)ρ (21)

for .

{restatable}

theomodrate Assume that the c-q state has . Then for any sequence satisfying Eq. (19), and , we have the asymptotic expansion

 R∗(n,εn)=H(X|B)ρ+√2V(X|B)ρan+o(an). (22)

The proof of Theorem 3 is in Section 7.2.

## 4. Preliminaries and Notation

Throughout this paper, we consider a finite-dimensional Hilbert space . The set of density operators (i.e. positive semi-definite operators with unit trace) on is defined as . The quantum systems, denoted by capital letter (e.g. , ), are modeled by finite-dimensional Hilbert spaces (e.g. ); copies of a system is denoted by , and is modeled by the -fold tensor product of the Hilbert spaces, . For , we denote by if the support of is contained in the support of . The identity operator on is denoted by . The subscript will be removed if no confusion is possible. We use as the standard trace function. For a bipartite state , denotes the partial trace with respect to system . We denote by . The indicator function is defined as follows: if the event is true; otherwise .

For a positive semi-definite operator whose spectral decomposition is , where and are the eigenvalues and eigenprojections of , its power is defined as: . In particular, denotes the projection onto , where we use to denote the support of the operator . Further, means . Additionally, we define the pinching map with respect to by . The and are performed on base throughout this paper.

### 4.1. Entropic Quantities

For any pair of density operators and , we define the quantum relative entropy, Petz’s quantum Rényi divergence [29], sandwiched Rényi divergence [20, 21], and the log-Euclidean Rényi divergence [30, 23], respectively, as follows:

 D(ρ∥σ) :=Tr[ρ(logρ−logσ)], (23) Dα(ρ∥σ) :=1α−1logQα(ρ∥σ),Qα(ρ∥σ):=Tr[ρασ1−α]; (24) D∗α(ρ∥σ) :=1α−1logQ∗α(ρ∥σ),Q∗α(ρ∥σ):=Tr[(ρ12σ1−ααρ12)α]; (25) D♭α(ρ∥σ) :=1α−1logQ♭α(ρ∥σ),Q♭α(ρ∥σ):=Tr[eαlogρ+(1−α)logσ]. (26)

We define the quantum relative entropy variance [28, 31] by

 V(ρ∥σ):=Tr[ρ(logρ−logσ)2]−D(ρ∥σ)2. (27)

The above quantity is non-negative. Further, it follows that

 V(ρ∥σ)>0impliesD(ρ∥σ)>0. (28)

For , and , or , the quantum conditional Rényi entropies are given by

 Ht,↑α(A|B)ρ:=maxσB∈S(B)−Dtα(ρAB∥\mathds1A⊗σB),Ht,↓α(A|B)ρ:=−Dtα(ρAB∥\mathds1A⊗ρB). (29)

When and , or in Eq. (29), both quantities coincide with the usual quantum conditional entropy:

 Ht,↑1(A|B)ρ=Ht,↓1(A|B)ρ=H(A|B)ρ:=H(AB)ρ−H(B)ρ, (30)

where denotes the von Neumann entropy. Similarly, for , we define the conditional information variance:

 V(A|B)ρ:=V(ρAB∥\mathds1A⊗ρB). (31)

It is not hard to verify from Eq. (28) that

 V(A|B)ρ>0impliesH(A|B)ρ>0. (32)
###### Lemma 4.1 ([32], [23, Lemma III.3, Lemma III.11, Theorem III.14, Corollary III.25], [33, Corollary 2.2]).

Let . Then,

 α↦logQα(ρ∥σ) and α↦logQ♭α(ρ∥σ) are convex on(0,1); (33) α↦Dα(ρ∥σ) is % continuous and monotone increasing on [0,1]. (34)

Moreover4,

 ∀α∈(0,1),(ρ,σ)↦Q♭α(ρ∥σ) is jointly concave on S(H)×S(H); (35) ∀α∈[0,1],σ↦Dα(ρ∥σ) is convex and lower semi-continuous on S(H). (36)
###### Proposition 4.2 (Properties of α-Rényi Conditional Entropy).

Given any classical-quantum state , the following holds:

[(a)]
1. The map is continuous and monotonically decreasing on .

2. The map is concave on .

The proof is provided in Appendix A.

Given two states and , one can define an associated binary hypothesis testing problem of determining which of the two states was given via a binary POVM. Such a POVM is described by an operator (associated, say, with the outcome ) such that , called the test. Two types of errors are possible; the probability of measuring and reporting the outcome is given by and called the type-I error, while the probability of measuring and reporting the outcome is given by and is called the type-II error. The hypothesis testing relative entropy (e.g. as defined in [34]) is defined by

and characterizes the minimum type-II error incurred via a test which has type-I error at most . The hypothesis testing relative entropy satisfies the data-processing inequality

 DεH(ρ∥σ)≥DεH(Φ(ρ)∥Φ(σ)) (38)

for any completely positive map  [34]. This quantity has an interpretation as a relative entropy as it satisfies the following asymptotic equipartition property:

 limn→∞1nDεH(ρ⊗n∥σ⊗n)=D(ρ∥σ) (39)

which was proven in two steps, by [35] and [36].

We can consider a related quantity, which denotes the minimum type-I error such that the type-II error does not exceed . That is,

 ˆαμ(ρ∥σ)=infT:0≤T≤\mathds1Tr[Tσ]≤μTr[(\mathds1−T)ρ]≡exp(−DμH(σ∥ρ)). (40)

By (38), for any completely positive map we have

 ˆαμ(ρ∥σ)=exp(−DμH(σ∥ρ))≤exp(−DμH(Φ(σ)∥Φ(ρ)))=ˆαμ(Φ(ρ)∥Φ(σ)). (41)

We also consider the so-called max-relative entropy, given by

 Dmax(ρ∥σ):=inf{γ:ρ≤2γσ}. (42)

In establishing the exact strong converse exponent, we will employ a smoothed variant of this quantity,

 Dδmax(ρ∥σ)=min¯ρ∈Bδ(ρ)Dmax(¯ρ∥σ), (43)

where

 Bδ(ρ)={¯ρ∈D(H):dop(¯ρ,ρ)≤δ} (44)

is the -ball in the distance of optimal purifications, , defined by

 dop(ρ,σ)=minψρ,ψσ12∥|ψρ⟩⟨ψρ|−|ψσ⟩⟨ψσ|∥1 (45)

where the minimum is over purifications of and of . By equation (2) of [37], the distance satisfies

 12dop(ρ,σ)2≤12∥ρ−σ∥1≤dop(ρ,σ). (46)

It was shown in [37] that satisfies an asymptotic equipartition property. In fact, Theorem 14 of [37] gives finite upper and lower bounds on which converge to . We will only need the upper bound, namely

 1nDδmax(ρ⊗n∥σ⊗n)≤D(ρ∥σ)+1√n4√2(logη)log11−√1−ε2, (47)

where .

The smoothed max-relative entropy satisfies the following simple but useful relation (which is a one-shot analog of Lemma V.7 of [23]).

###### Lemma 4.3.

For , if then

 Tr[(σ−eaρ)+]≤2δ. (48)
###### Proof of Lemma 4.3.

By the definition (42), for any , we have , and therefore , which proves the result for . For and there exists a density matrix with and . Setting , we have

 Tr[P+(σ−eaρ)]=Tr[P+(σ−~σ)]+Tr[P+(~σ−eaρ)]. (49)

Since , we have . Thus,

 Tr[P+(σ−eaρ)]≤Tr[P+(σ−~σ)]≤∥σ−~σ∥1≤2dop(σ,~σ)≤2δ (50)

using (46) in the second to last inequality. ∎

### 4.2. Error Exponent Function

For , or , we define

 Etr(R)≡Etr(ρXB,R) :=max0≤s≤1{Et0(s)+sR}; (51) Etsp(R)≡Etsp(ρXB,R) :=sups≥0{Et0(s)+sR}; (52) Etsc(R)≡Etsc(ρXB,R) :=sup−1

omitting the dependence on except where necessary. For , i.e. the Petz’s Rényi conditional entropy, one has

 E0(s)=−logTr[(TrXρ11+sXB)1+s] (55)

by quantum Sibson’s identity [38]. We also define another version of exponent function via :

 E↓r(R) :=max0≤s≤1{E↓0(s)+sR}, (56) E↓0(s) :=−sH↓1−s(X|B)ρ. (57)

Note that [23, Proposition III.18]

 D∗α(⋅∥⋅)≤Dα(⋅∥⋅)≤D♭α(⋅∥⋅),α∈[0,1) (58)