Improved Finite Blocklength Converses for Slepian-Wolf Coding via Linear Programming

# Improved Finite Blocklength Converses for Slepian-Wolf Coding via Linear Programming

Sharu Theresa Jose  Ankur A. Kulkarni Sharu and Ankur are with the Systems and Control Engineering group at the Indian Institute of Technology Bombay, Mumbai, 400076, India. email: sharutheresa@iitb.ac.in, kulkarni.ankur@iitb.ac.in. This work was presented in part at the IEEE Information Theory Workshop, in Kaohsiung, Taiwan, 2017 [1].
###### Abstract

A new finite blocklength converse for the Slepian-Wolf coding problem is presented which significantly improves on the best known converse for this problem, due to Miyake and Kanaya [2]. To obtain this converse, an extension of the linear programming (LP) based framework for finite blocklength point-to-point coding problems from [3] is employed. However, a direct application of this framework demands a complicated analysis for the Slepian-Wolf problem. An analytically simpler approach is presented wherein LP-based finite blocklength converses for this problem are synthesized from point-to-point lossless source coding problems with perfect side-information at the decoder. New finite blocklength metaconverses for these point-to-point problems are derived by employing the LP-based framework, and the new converse for Slepian-Wolf coding is obtained by an appropriate combination of these converses.

## I Introduction

The intractability of evaluating the nonasymptotic or finite blocklength fundamental limit of communication has put the onus on discovering finite blocklength achievability and converses that sandwich tightly the nonasymptotic fundamental limit. Accordingly, recent years have witnessed a surge of tight finite blocklength achievability and converses ([4], [5], [6], [3]), particularly for coding problems in the point-to-point setting.

Eventhough many sharp and asymptotically tight finite blocklength converses have been obtained in the point-to-point setting employing tools like hypothesis testing [4] and information spectrum [7], deriving tight finite blocklength converses for multiterminal coding problems still remains particularly challenging. Part of this challenge could be attributed to the difficulty in extending the techniques in the point-to-point setting to the network setting. In this paper, we consider the classical multiterminal source coding problem – the Slepian-Wolf coding problem and show that the extension of the linear programming (LP) based framework we introduced for the point-to-point setting in [3], in fact results in new and improved finite blocklength converses for this problem. Moreover, it yields a framework via a hierarchy of relaxations in which classical converses can be recovered, and converses for the networked problem can be synthesized using a combination of point-to-point converses.

Consider the finite blocklength Slepian-Wolf distributed lossless source coding problem (in Figure 1) posed as the following optimization problem,

 SW \rm minf1,f2,g E[I{(S1,S2)≠(ˆS1,ˆS2)}] \rm s.t.  X1 = f1(S1),X2 = f2(S2),(ˆS1,ˆS2) = g(Y1,Y2),

where are discrete random variables taking values in fixed, finite spaces , respectively. Notice that these spaces could be Cartesian products of smaller spaces, and hence could be sets of finite length strings. Here, and represent the two correlated sources distributed according to a known joint probability distribution . The source signals are seperately encoded by functions and to produce signals and , respectively. The encoded signals are sent through a deterministic channel with conditional distribution to get the signal , where represents the indicator function which equals unity when ‘’ is true and is zero otherwise. is then jointly decoded by to obtain the output signal For the finite blocklength Slepian-Wolf coding problem, we note that spaces , , and , . An error in transmission occurs when . Hence, the objective of the finite blocklength Slepian-Wolf coding problem SW is to minimize the probability of error over all codes, i.e. over all encoder-decoder functions .

Our interest in this paper is in obtaining finite blocklength converses (or lower bounds) on the optimal value of SW and our approach is via the linear programming (LP) based framework introduced in [3]. In [3], we showed that this framework recovers and improves on most of the well-known finite blocklength converses for point-to-point coding problems. In particular, the LP framework is shown to imply the metaconverse of Polyanskiy-Poor-Verdú [4] for finite blocklength channel coding. For lossy source coding and lossy joint source-channel coding with the probability of excess distortion as the loss criterion, the LP framework results in two levels of improvements on the asymptotically tight tilted-information based converses of Kostina and Verdú in [5] and [6], respectively.

Fundamental to this framework is the observation that the finite blocklength coding problem can be posed equivalently as a nonconvex optimization problem over joint probability distributions. A natural optimizer’s approach [8] to obtain lower bounds would then be via a convex relaxation of the nonconvex optimization problem. In particular, resorting to the “lift-and-project” technique due to Lovasz, Schrijver, Sherali, Adams and others [9], we obtain a LP relaxation of the problem. From linear programming duality, we then get that the objective value of any feasible point of the dual of this LP relaxation yields a lower bound on the optimal loss in the finite blocklength problem. As a result of this observation, the problem of obtaining converses reduces to constructing feasible points for the dual linear program.

The converses in [3] stated above for various point-to-point settings emerge as special cases of this LP-based framework, implied by the construction of specific dual feasible points. This tightness of the LP relaxation shows that there is an alternative, asymptotically tight way of thinking about optimal finite blocklength coding – as the optimal packing of a pair of source and channel flows satisfying a certain error density bottleneck. The flows here are the variables of the dual program and the bottleneck, its constraint.

In this paper, we further this theme towards the Slepian-Wolf coding problem. In the present paper we observe that our LP relaxation has an operational interpretation based on optimal transport [10], wherein one designs not only codes, but also couplings between them to minimize the resulting ‘error’. Using the LP relaxation, we first establish new, clean, meta-converses in the point-to-point setting for lossy source-coding problems with side-information at the decoder; these converses are stronger than our earlier converses in [3], they imply the hypothesis testing and tilted information based converses of Kostina and Verdú [5, Theorem 7,Theorem 8] and the converse of Palzer and Timo [11, Theorem 1], and are, to the best of our knowledge, the strongest known. Subsequently, we analyse the dual LP of the finite blocklength Slepian-Wolf coding problem. When extended to the networked Slepian-Wolf coding problem, the LP-based framework results in a large number of dual variables and constraints, which makes it quite challenging to analyse and interpret. Consequently, we devise an analytically simpler approach to construct feasible points of the dual program using feasible points of simpler point-to-point problems. This yields tight finite blocklength converses that improve on the hitherto best known converse for this problem, due to Miyake and Kanaya [2].

The dual variables of the LP relaxation of SW also have a structure of ‘source flows’ and ‘channel flows’. Though, as yet, we do not have physical or operational interpretations for these ‘flows’, they serve as useful analytical devices for synthesizing converse expressions for SW. We find that source and channel flows for problem SW follow a hierarchy such that flows at the highest level satisfy the error density bottleneck, whereas the flows at the next levels have to meet a bottleneck, dictated by the flows at the level above, along various paths in the network. We show that the well-known information spectrum-based converse of Miyake and Kanaya [2] results from a particular way of constructing these flows. Improvements on this converse follow by synthesizing these flows in a more sophisticated manner. Specifically, by synthesizing flows for the networked problem using flows from the following point-to-point problems: (a) lossless source coding of jointly encoded correlated sources , (b) lossless source coding of with perfect side-information of available at the decoder, and (c) lossless source coding of with perfect side-information of at the decoder, we show that a new finite blocklength meta-converse results, which improves on the converse of Miyake and Kanaya.

The paper is organized as follows. In Section II, we consider the point-to-point lossy source coding problem with side-information. By the LP framework and an appropriate choice of source and channel flows, we derive new tight finite blocklength converses for these problems. In Section III, we discuss the extension of the LP relaxation to problem SW and establish the duality based framework. In Section IV, we illustrate how to synthesize new finite blocklength converses for SW from point-to-point sub-problems and present a new finite blocklength converse which improves on the converse of Miyake and Kanaya. Lastly, in Section V, we discuss the structure of the constraints of the dual program corresponding to SW and possible avenues for further strengthening of the bound.

### I-a Notation

Throughout this paper, we consider only discrete random variables. We make use of the following notation. Upper case letters represent random variables taking values in finite spaces represented by calligraphic letters, respectively; lower case letters represent the specific values these random variables take. represents the indicator function which is equal to one when ‘’ is true and is zero otherwise. denotes the set of all probability distributions on ’ and represents a specific distribution. If is a joint probability distribution, let denote the marginal distribution of ‘’. For example, represents the vector with for as its components. Let stand for . If represents an optimization problem, then represents its optimal value and represents its feasible region. LHS stands for Left Hand Side and RHS stands for Right Hand Side. The notation denotes that is independent of .

## Ii Finite Blocklength Point-to-Point Source Coding

In this section, we consider the point-to-point lossy source coding problem and the lossless source coding problem with side information at the decoder. We employ the LP relaxation framework to obtain finite blocklength converses for these problems.

### Ii-a Point-to-Point Lossy Source Coding

We begin with point-to-point lossy source coding. The finite blocklength lossy source coding problem (Figure 2) with probability of excess distortion as the loss criterion can be posed as the following optimization problem,

 SC \rm minf,g E[I{d(S,ˆS)>d}] \rm s.t.  X = f(S)ˆS = g(Y).

Here, and are discrete random variables taking values in fixed, finite spaces respectively, with , and . represents the source message distributed according to a known distribution . The source message is encoded according to to get the signal which is transmitted across a deterministic channel with conditional probability distribution . represents the channel output which is decoded according to to get the message at the destination. represents the distortion measure and represents the distortion level. The optimization problem SC, thus, seeks to find a code which minimizes , the probability of excess distortion under the measure induced by .

SC can be posed equivalently as the following optimization problem over joint probability distributions,

 SC \rm minQ,QX|S,QˆS|Y ∑s,ˆsI{d(s,ˆs)>d}∑x,yQ(s,x,y,ˆs) \rm s.t.  Q() ≡ PSQX|SPY|XQˆS|Y(),∑xQX|S(x|s) = 1∀s∈S,∑ˆsQˆS|Y(ˆs|y) = 1∀y∈Y,QX|S(x|s) ≥ 0∀s∈S,x∈X,QˆS|Y(ˆs|y) ≥ 0∀ˆs∈ˆS,y∈Y,

where , and . Here, represents a randomized encoder and represents a randomized decoder. We refer the readers to [3] for details on this formulation.

To obtain lower bounds on the optimal value of SC, we adopt the LP relaxation detailed in [3]. Towards this, we introduce a new variable and obtain valid constraints involving through the constraints of the problem. Specifically, multiply both sides of the constraint by for all and multiply both sides of by for all . Replacing the bilinear product terms in the resulting set of constraints and in the objective of with , gives new linear constraints in the variables , which together with and give the following LP relaxation.

 LP \rm minQX|S,QˆS|Y,W ∑I{d(s,ˆs)>d}PS(s)PY|X(y|x)W() \rm s.t.  ∑xQX|S(x|s) = 1:γa(s)∀s∑ˆsQˆS|Y(ˆs|y) = 1:γb(y)∀y∑xW()−QˆS|Y(ˆs|y) = 0:λs(s,ˆs,y)∀s,ˆs,y∑ˆsW()−QX|S(x|s) = 0:λc(x,s,y)∀x,s,yQX|S,QˆS|Y,W ≥ 0.

Above and are Lagrange multipliers corresponding to the respective constraints.

#### Ii-A1 An operational interpretation via optimal transport

The above LP relaxation can be explained operationally by relating it to the optimal transport problem [10]. Note that for each and , is a coupling on between the marginals and ; let the set of such be denoted by . The LP relaxation of SC is a nested minimization – the inner minimization is over all couplings and the outer minimization is over all randomized codes :

 minQX|S,QˆS|YminW∈Ξ(QX|S,QˆS|Y)∑I{d(s,ˆs)>d}PS(s)PY|X(y|x)W().

The original problem SC has the outer minimization over codes, but in place of the inner minimization over it employs the product to obtain the distribution. Thus the LP relaxation is arrived at by considering the term in SC as an element of and minimizing the resulting cost over all elements of . Operationally speaking, the LP relaxation seeks to design codes and couplings between them that minimize the error under the joint distribution induced by the coupling.

We caution the readers that for multiterminal problems, one must apply this interpretation with additional caveats. We discuss this in Section III-A1.

#### Ii-A2 Duality and bounds

Employing the Lagrange multipliers corresponding to the constraints of LP, we obtain the following dual of LP,

 DP \rm maxγa,γb,λs,λc ∑sγa(s)+∑yγb(y) \rm s.t.  γa(s)−∑yλc(x,s,y) ≤ 0∀x,s(P1)γb(y)−∑sλs(s,ˆs,y) ≤ 0∀ˆs,y(P2)λs(s,ˆs,y)+λc(x,s,y) ≤ Σ()∀(P3)

where for all , since .

In problem DP, it is optimal to choose and such that (P1) and (P2) hold with equality, i.e., and . Then the optimal value of DP with as the RHS of (P3) evaluates to,

 OPT(DP,Σ()) =maxλs,λc{∑sminx∑yλc(x,s,y)+∑yminˆs∑sλs(s,ˆs,y)} s.tλc(x,s,y)+λs(s,ˆs,y)≤Σ()  ∀∈ˆZ. (1)

It follows that if we construct functions and satisfying (1), then linear programming duality implies the following lower bound on ,

 OPT(SC)≥OPT(LP)=OPT(DP) ≥∑sminx∑yλc(x,s,y)+∑yminˆs∑sλs(s,ˆs,y). (2)

Notice that and are functions on subspaces of . is a function of the source signal , the channel input and channel output ; we call this function a channel flow. On the other hand, is a function of the source signal , the decoder input and decoder output . We refer to it as a source flow. Hence, for the point-to-point finite blocklength source coding problem, our LP-based framework reduces to constructing a source flow and a channel flow such that they satisfy the bottleneck imposed by the constraint (P3). The RHS of (P3) is the “error density”, , and hence the challenge is to optimally pack a source flow and a channel flow so as to not exceed the error density.

It was shown in [3] that by an appropriate construction of these source and channel flows, a new finite blocklength converse for lossy source coding results which improves on the tilted information based converse of Kostina and Verdú [5]. Modifying and generalizing this construction of flows, we now present a new metaconverse for lossy source coding, which implies our improvement on the Kostina-Verdú converse and the hypothesis testing based converse in [5, Theorem 8]. To the best of our knowledge, the metaconverse below is the strongest known.

###### Theorem II.1 (Metaconverse for Lossy Source Coding)

Consider problem SC. For any code,

 E[I{d(S,ˆS)>d}]≥OPT(SC)≥OPT(LP)=OPT(DP) ≥sup0≤ϕ(s)≤PS(s){∑sϕ(s)−Mmaxˆs∑sϕ(s)I{d(s,ˆs)≤d}}, (3)

where the supremum is over all functions such that for all .

Proof : Consider the following values of source and channel flows,

 λc(x,s,y) ≡I{y=x}ϕ(s), (4) λs(s,ˆs,y) ≡−ϕ(s)I{d(s,ˆs)≤d}.

We now check if the above choice of flows satisfy constraint (P3). For this, consider the following two cases.
Case 1: .
In this case, and which is the RHS of (P3).
Case 2: .
In this case, the RHS of (P3) is zero and LHS becomes, thereby satisfying (P3). Hence, the considered choice of flows satisfy constraint (P3). Consequently, the required lower bound follows from (2) by taking supremum over such that .
In particular, choosing in (3) where , and taking the supremum over such , we get the following bound,

 E[I{d(S,ˆS)>d}]≥OPT(SC)≥OPT(LP) ≥supz≥0{∑smin{PS(s),z(s)} −Mmaxˆs∑smin{PS(s),z(s)}I{d(s,ˆs)≤d}}. (5)

Remark II.1. (Choice of Flows) An easy way of motivating the choice of flows is as follows. Observe that if for all we have that,

 Σ() ≥ϕ(s)I{y=x}I{d(s,ˆs)>d} =ϕ(s)I{y=x}−ϕ(s)I{y=x}I{d(s,ˆs)≤d}.

An obvious choice of the flows would thus be and , which results in our metaconverse in (3).

The following results are corollaries to the metaconverse. below is the -tilted information; we refer the reader to [5] for details.

###### Corollary II.2

(Metaconverse Recovers Improvement on Kostina-Verdú Converse from [3, Corollary 5.9])
The following converse follows from (5):

 E[I{d(S,ˆS)>d}]≥OPT(SC)≥OPT(LP)=OPT(DP) ≥supγ{P[jS(S,d)≥γ+logM]+1M× ∑sPS(s)\rm exp(jS(s,d)−γ)I{jS(s,d)

which is the improvement on the Kostina-Verdú tilted information based converse in [3, Corollary 5.9].

Proof : To see this, take for any scalar and lower bound with . Subsequently, take supremum over to get the required bound.

Consider a binary hypothesis testing problem between distributions and . Let represent the minimum type-I error, over all tests such that the type-II error, is at most . The following corollary shows that the metaconverse in Theorem II.1 in fact recovers the hypothesis testing based converse of Kostina and Verdú [5, Theorem 8].

###### Corollary II.3

(Metaconverse Recovers Kostina-Verdú Hypothesis testing based Converse from [5, Theorem 8])
The following converse follows from (5),

 E[I{d(S,ˆS)>d}]≥OPT(SC)≥OPT(LP)=OPT(DP) \lx@stackrel(a)≥supQS∈P(S)supβ≥0{∑smin{PS(s),βQS(s)}−βM∗} \lx@stackrel(b)=supQS∈P(S)αM∗(PS,QS), (6)

which is equivalent to the hypothesis testing based converse of Kostina and Verdú in [5, Theorem 8]. Here, and is the measure on induced by .

Proof : To recover the converse in from (5), take where and lower bound with . Subsequently, take the supremum over and to get the required bound. The proof of the relation in is included in Corollary A.1 in Appendix A.

###### Corollary II.4

(Metaconverse Recovers Palzer-Timo Converse [11, Theorem 1])
The following converse follows from (5),

 E[I{d(S,ˆS)>d}]≥OPT(SC)≥OPT(LP)=OPT(DP) ≥supβ∈R{P[jS(S,d)≥β] −MmaxˆsP[jS(S,d)≥β,d(S,ˆs)≤d]},

which is the converse of Palzer and Timo in [11, Theorem 1].

Proof : To obtain this converse from (5), take where in (3) and take supremum over . Notice that in this case, since .

The finite blocklength lossless data compression problem results from SC by setting and . The following corollary particularizes the metaconverse in (3) to lossless data compression. In this case, the metaconverse takes a particularly simple form.

###### Corollary II.5 (Metaconverse for Lossless Source Coding)

For lossless data compression, consider the setting of Theorem II.1 with and . Consequently, for any code, the following converse follows from Theorem II.1,

 E[I{S≠ˆS}]≥OPT(SC)≥OPT(LP)=OPT(DP) ≥sup0≤ϕ≤PS{∥ϕ∥1−M∥ϕ∥∞}. (7)

In the above converse we have viewed as a vector in that is nonnegative and dominated by The maximization in (7) is a tradeoff between increasing the norm of on the one hand, and decreasing the norm of on the other. One plausible strategy for this tradeoff is to take for those for which is not too large, and zero otherwise. Specifically, one may take for some Then the RHS of (7) is lower bounded by

 supγ≥0{P[h(S)≥logM+γ]−% \rm exp(−γ)},

where The above converse is [5, Theorem 7] specialized to the lossless case.

Having outlined the LP based framework for point-to-point lossless source coding, we now consider three problems that will serve as sub-problems for analysing problem SW.

#### Ii-A3 Lossless Coding of Jointly Encoded Correlated Sources (S1,s2)

In this sub-problem of Slepian-Wolf coding problem, the correlated sources are jointly encoded by to get . is sent through the channel to get which is then decoded according to . The objective, as for SW problem, is to losslessly recover at the destination.

It is easy to see that the above joint encoding problem is equivalent to the point-to-point lossless source coding problem SC with , , , and with . Consequently, to obtain finite blocklength converses for the joint encoding problem of correlated sources, we resort to the following generalized version of DP for lossless source coding problem,

 DPJE \rm maxˆγa,ˆγb,ˆλs,ˆλc ∑s1,s2ˆγa(s1,s2)+∑y1,y2ˆγb(y1,y2) \rm s.t.

where , for all . Though this is a straightforward generalization of DP, we will need this later and hence we have included it here.

As in the case of DP, taking and such that (A1) and (A2) hold with equality, can be written in terms of the channel flow and source flow The metaconverse for lossless source coding problem in Corollary II.5 then readily implies the following corollary.

###### Corollary II.6 (Metaconverse for Jointly Encoded Sources)

Consider problem SC with , , , and with . Consequently for any code, we have from Corollary II.5,

 E[I{(S1,S2)≠(ˆS1,ˆS2)}]≥OPT(DPJE) ≥sup0≤ˆϕ(s1,s2)≤PS1,S2(s1,s2){∑s1,s2ˆϕ(s1,s2) −M1M2maxˆs1,ˆs2ˆϕ(ˆs1,ˆs2)}, (8)

where the supremum is over such that for all .

Proof : To obtain the required converse, we consider the following choice for the flows in DPJE, which generalizes the ones adopted in (4).

 ˆλc(x1,x2,s1,s2,y1,y2) (9) ˆλs(s1,s2,ˆs1,ˆs2,y1,y2) ≡−ˆϕ(s1,s2)I{(s1,s2)=(ˆs1,ˆs2)},

The feasibility of these flows with respect to (A3) can be verified as in the proof of Theorem II.1 and we skip the proof here.

### Ii-B Lossless Source Coding of S1 with S2 as the Side-Information

We now consider the following sub-problem of Slepian-Wolf coding: is to be recovered losslessly at the destination with available as side-information at the decoder (Figure 3). Towards this, is encoded according to to get , which is transmitted through the channel to get . is the side information available at the decoder which decodes according to to get . The finite blocklength source coding of given as the side information can be then posed as the following optimization problem,

 SID1|2 \rm minf1,g E[I{S1≠ˆS1}] \rm s.t.  X1 = f1(S1),ˆS1=g(S2,Y1).

Thus, seeks to obtain a code which minimizes , the average probability of error.

To obtain finite blocklength converses, we employ the LP relaxation approach in [3] to obtain the following LP relaxation of the problem .

 LPSI1|2 \rm minQX1|S1,QˆS1|Y1,S2,W ∑Ψ()W() \rm s.t.  ∑x1QX1|S1(x1|s1) ≡ 1:¯γa(s1)∑ˆs1QˆS1|Y1,S2(ˆs1|y1,s2) ≡ 1:¯γb(y1,s2)∑x1W()−QˆS1|Y1,S2(ˆs1|y1,s2) = 0:¯λ(1|2)s(s,ˆs1,y1)∑ˆs1W()−QX1|S1(x1|s1) ≡ 0:¯λ(1|2)c(s,x1,y1)QX1|S1,QˆS1|Y1,S2,W ≥ 0,

where