Distributed Compression for the Uplink of a Backhaul-Constrained Coordinated Cellular Network

Distributed Compression for the Uplink of a Backhaul-Constrained Coordinated Cellular Network

Aitor del Coso and Sébastien Simoens Aitor del Coso is with the Centre Tecnològic de Telecomunicacions de Catalunya (CTTC), Av. Canal Olímpic S/N, 08860, Castelldefels, Spain. E-mail: aitor.delcoso@cttc.esSébastien Simoens is with Motorola Labs PARIS, Parc Les Algorithmes, 91193, Saint-Aubin, France. E-mail: simoens@motorola.com. This work was partially supported by the internship program of Motorola Labs, Paris. Also, it was partially funded by the European Comission under projects COOPCOM (IST-033533) and NEWCOM++ (IST-216715).

We consider a backhaul-constrained coordinated cellular network. That is, a single-frequency network with multi-antenna base stations (BSs) that cooperate in order to decode the users’ data, and that are linked by means of a common lossless backhaul, of limited capacity . To implement receive cooperation, we propose distributed compression: BSs, upon receiving their signals, compress them using a multi-source lossy compression code. Then, they send the compressed vectors to a central BS, which performs users’ decoding. Distributed Wyner-Ziv coding is proposed to be used, and is optimally designed in this work. The first part of the paper is devoted to a network with a unique multi-antenna user, that transmits a predefined Gaussian space-time codeword. For such a scenario, the compression codebooks at the BSs are optimized, considering the user’s achievable rate as the performance metric. In particular, for the optimum codebook distribution is derived in closed form, while for an iterative algorithm is devised. The second part of the contribution focusses on the multi-user scenario. For it, the achievable rate region is obtained by means of the optimum compression codebooks for sum-rate and weighted sum-rate, respectively.


I Introduction

Inter-cell interference is one of the most limiting factors of current cellular networks. It can be partially, but not totally, mitigated resorting to frequency-division multiplexing, sectorized antennas and fractional frequency reuse [1]. However, a more spectrally efficient solution has been recently proposed: coordinated cellular networks [2]. They consist of single-frequency networks with base stations (BSs) cooperating in order to transmit to and receive from the mobile terminals. Beamforming mechanisms are thus deployed in the downlink, as well as coherent detection in the uplink, to drastically augment the system capacity[3, 4]. Hereafter, we only focus on the uplink channel.

Preliminary studies on the uplink performance of coordinated networks consider all BSs connected via a lossless backhaul with unlimited capacity [5][6]. Accordingly, the capacity region of the network equals that of a MIMO multi-access channel, with a supra-receiver containing all the antennas of all cooperative BSs [7]. Such an assumption seems optimistic in short-mid term, as operators are currently worried about the costs of upgrading their backhaul to support e.g., HSPA traffic load. To deal with a realistic backhaul constraint, two approaches have been proposed: i) distributed decoding[8, 9], consisting on a demodulating scheme distributely carried out among BSs, based on local decisions and belief propagation. Decoding delay appears to be its main problem. ii) Quantization [10], where BSs quantize their observations and forward them to decoding unit. Its main limitation relies on its inability to take profit of signal correlation between antennas/BSs; thus, introduces redundancy into the backhaul.

This paper considers a new approach for the network: distributed compression. The cooperative BSs, upon receiving their signals, distributely compress them using a multi-source lossy compression code [11]. Then, via the lossless backhaul, they transmit the compressed signals to the central unit (also a BS); which decompresses them using its own received signal as side information, and finally uses them to estimate the users’ messages. Distributed compression has been already proposed for coordinated networks in [12, 13, 14]. However, in those works, authors consider single-antenna BSs with ergodic fading. We extend the analysis here to the multiple-antenna case with time-invariant fading.

The compression of signals with side information at the decoder is introduced by Wyner and Ziv in [15, 16]. They show that side information at the encoder is useless (i.e., the rate-distortion tradeoff remains unchanged) to compress a single, Gaussian, source when it is available at the decoder [16, Section 3]. Unfortunately, when considering multiple (correlated) signals, independently compressed at different BSs, and to be recovered at a central unit with side information, such a statement can not be claimed. Indeed, this is an open problem, for which it is not even clear when source-channel separation applies [17]. To the best of authors knowledge, the scheme that performs best (in a rate-distortion sense) for this problem is Distributed Wyner-Ziv (D-WZ) compression [18]. Such a compression is the direct extension of Berger-Tung coding to the decoding side information case [19, 20]. In turn, Berger-Tung compression can be thought as the lossy counterpart of the Slepian-Wolf lossless coding [21]. D-WZ coding is thus the compresssion scheme proposed to be used, and is detailed in the sequel.

Summary of Contributions. This paper considers a single-frequency network with multi-antenna BSs. The first base station, denoted , is the central unit and centralizes the users’ decoding. The rest, , are cooperative BSs, which distributely compress their received signals using a D-WZ code, and independently transmit them to via the common backhaul of aggregate capacity . In the network, time-invariant, frequency-flat channels are assumed, as well as transmit and receive channel state information (CSI) at the users and BSs, respectively.

The first part of the paper is devoted to a network with a single user, equipped with multiple antennas. It aims at deriving the optimum compression codebooks at the BSs, for which the user’s transmission rate is maximized. Our contributions are the following:

  • First, Sec. II revisits Wyner-Ziv coding [16, Section 3] and Distributed Wyner-Ziv coding [19], and adapts them to our compression scenario.

  • For the single user transmitting a given Gaussian codeword, Sec. III proves that the optimum compression codebooks at the BSs are Gaussian distributed. Accordingly, the compression step is modelled by means of Gaussian ”compression” noise, added by the BSs on their observations before retransmitting them to the central unit.

  • Considering a unique cooperative BS in the network (i.e., ), Sec. IV derives in closed form the optimum ”compression” noise for which the user’s rate is maximized. We also show that conditional Karhunen-Loève transform plus independent Wyner-Ziv coding of scalar streams is optimal.

  • The compression design is extended in Sec. V to arbitrary BSs. The optimum ”compression” noises (i.e., the optimum codebook distributions) are obtained by means of an iterative algorithm, constructed using dual decomposition theory and a non-linear block coordinate approach [22, 23]. Due to the non-convexity of the noises optimization, only local convergence is proven.

The second part of the paper extends the analysis to a network where multiple users transmit simultaneously. For it, the achievable rate region is described resorting to the weighted sum-rate optimization:

  • First, the sum-rate of the network is derived in Sec. VI, adapting previous results a single-user. Later, the weighted sum-rate, and its associated optimum compression ”noises”, are obtained by means of an iterative algorithm, constructed using dual decomposition and Gradient Projection [23].

Notation. denotes expectation. , and stand for the transpose of , conjugate transpose of and complex conjugate of , respectively. . denotes mutual information, entropy. The derivative of a scalar function with respect to a complex matrix is defined as in [24], i.e., . In such a way, e.g., . Moreover, we compactly write , and . A sequence of vectors is compactly denoted by . Furthermore, to define block-diagonal matrices, we state , with square matrices. stands for convex hull. Finally, the covariance of random vector conditioned on random vector is denoted by and computed

Ii Compression of Vector Sources

The aim of compression within coordinated networks is to make the decoder extract the more mutual information from the reconstructed signals. Known rate-distortion results apply to this goal as follows.

Ii-a Single-Source Compression with Decoder Side Information

Consider Fig. 1 with . Let be a zero-mean, temporally memoryless, Gaussian vector to be compressed at . Assume that it is the observation of the signal transmitted by user , i.e., . compresses the signal and sends it to , which makes use of its side information to decompress it. Finally, once reconstructed the signal into vector , the decoder uses it to estimate the message transmitted by the user. Wyner’s results [16] apply to this problem as follows.

Definition 1 (Single-source Compression Code)

A compression code with side information at the decoder is defined by two mappings, and and three spaces and , where

Proposition 1 (Wyner-Ziv Coding [16])

Let the random vector with conditional probability satisfy the Markov chain , and let and be jointly Gaussian. Then, considering a sequence of compression codes with side information at the decoder:


as if:

  • the compression rate satisfies

  • the compression codebook consists of random sequences drawn i.i.d. from , where ,

  • the encoding outputs the bin-index of codewords that are jointly typical with the source sequence . In turn, outputs the codeword that, belonging to the bin selected by the encoder, is jointly typical with .


The proposition is proven in [16, Lemma 5] using joint typicality arguments.

Ii-B Multiple-Source Compression with Decoder Side Information

Consider Fig. 1. Let be zero-mean, temporally memoryless, Gaussian vectors to be compressed independently at , respectively. Assume that they are the observations at the BSs of the signal transmitted by user , i.e., . The compressed vectors are sent to , which decompresses them using its side information and uses them to estimate the user’s message. Notice that the architecture in Fig. 1 imposes source-channel separation at the compression step, which is not shown to be optimal. However, it includes the coding scheme with best known performance: Distributed Wyner-Ziv coding [18]. It applies to the setup as follows.

Definition 2 (Multiple-source Compression Code)

A compression code with side information at the decoder is defined by mappings, , , and , and spaces , and , where

Proposition 2 (Distributed Wyner-Ziv Coding [18])

Let the random vectors , , have conditional probability and satisfy the Markov chain . Let and , be jointly Gaussian. Then, considering a sequence of compression codes with side information at the decoder:


as if:

  • the compression rates satisfy

  • each compression codebook , consists of random sequences drawn i.i.d. from , where .

  • for every , the encoding outputs the bin-index of codewords that are jointly typical with the source sequence . In turn, outputs the codewords , that, belonging to the bins selected by the encoders, are all jointly typical with .


The proposition is proven for discrete sources and discrete side information in [18, Theorem 2]. Also, the extension to the Gaussian case is conjectured therein. The conjecture can be proven by noting that D-WZ coding is equivalent to Berger-Tung coding with side information at the decoder[19]. In turn, Berger-Tung coding can be implemented through time-sharing of successive Wyner-Ziv compressions [20], for which introducing side information at the decoder reduces the compression rate as in (4). Due to space limitations, we limit the proof to this sketch.

Now, we can present the coordinated cellular network with D-WZ coding.

Iii System Model

Let a single source , equipped with antennas, transmit data to base stations , each one equipped with antennas. The BSs, as in typical 3G networks, are connected (through radio network controllers) to a common lossless backhaul of aggregate capacity , and is selected to be the decoding unit. This user-to-BSs assignment is assumed to be given by upper layers and out of the scope of the paper111The derivation of the optimum set of BSs to decode the user is out of the scope of our study. We refer the reader to e.g, [6] for assignment algorithms and selection criteria..

The source transmits a message mapped onto a zero-mean, Gaussian codeword , drawn i.i.d. from random vector and not subject to optimization. The transmitted signal, affected by time-invariant, memory-less fading, is received at the BSs under additive noise:


where is the MIMO channel matrix between user and , and is AWGN. Channel coefficients are known at both the BSs and at the user, while has centralized knowledge of all the channels within the network.

Iii-a Problem Statement

Base stations , upon receiving their signals, distributely compress them using a D-WZ compression code. Later, they transmit the compressed vectors to , which recovers them and uses them to decode. Considering so, the user’s message can be reliably decoded iif [12, Theorem 1]:

Second equality follows from (3) in Prop. 2. However, equality only holds for compression rates satisfying the set of constraints (4). As mentioned, in the backhaul there is only an aggregate rate constraint , i.e., , . Therefore, the set of constraints (4) can be all re-stated as:


Furthermore, from the Markov chain in Prop. 2, the following inequality holds


Therefore, forcing the constraint to hold makes all constraints in (7) to hold too. Accordingly, the maximum transmission rate of user is obtained from optimization:

Theorem 1

Let . Optimization (III-A) is solved for Gaussian conditional distributions . Thus, the compressed vectors can be modelled as , where is independent, Gaussian, ”compression” noise at . That is,


where the conditional covariance follows (54).


See Appendix B for the proof.

Remark 1

The maximization above is not concave in standard form: although the feasible set is convex, the objective function is not concave on .

Iii-B Useful Upper Bounds

Prior to solving (10), we present two upper bounds on it.

Upper Bound 1

The achievable rate in (10) is upper bounded by

Upper Bound 2

The achievable rate in (10) satisfies


See Appendix C for the proof.

Remark 2

Notice that, independently of the number of BSs, the achievable rate is bounded above by the capacity with plus the backhaul rate.

Iv The Two-Base Stations Case

We first solve (10) for . As mentioned, the objective function, which has to be maximized, is convex on . In order to make it concave, we change the variables , so that

The objective has turned into concave. However, the constraint now does not define a convex feasible set. Therefore, Karush-Kuhn-Tucker (KKT) conditions become necessary222Notice that all feasible points are regular. but not sufficient for optimality. To solve the problem, we need to resort to the general sufficiency condition [23, Proposition 3.3.4]: first, we derive a matrix for which the KKT conditions hold. Later, we demonstrate that the selected matrix also satisfies the general sufficiency condition, thus becoming the optimal solution. The optimum compression noise is finally recovered as . This result is presented in Theorem 2:

Theorem 2

Let and the conditional covariance (see Appendix A-A):


with eigen-decomposition . The optimum ”compression” noise at is , with


and is such that .


See Appendix D for the proof

Iv-a Practical Implementation

The optimum compression in Theorem 2 can be carried out using a practical Transform Coding (TC) approach. With TC, first transforms its received vector using an invertible linear function and then separately compresses the resulting scalar streams [25]. We show that the conditional Karhunen-Loève transform (CKLT) is an optimal linear transformation [26]. First, let recall that multiplying a vector by a matrix does not change the mutual information [27], i.e., and . From Theorem 2, the optimum compressed vector satisfies , with and . Therefore, the following compressed vectors are also optimal


where vector is referred to as the CKLT of vector . Notice now that is diagonal. Therefore, the elements of the compressed vector are conditionally uncorrelated given . Likewise, so are the elements of vector . Due to this uncorrelation, each element of vector can be compressed, without loss of optimality, independently of the compression of the others elements, at a compression rate , [16]. From Theorem 2 we validate that . This demonstrates that CKLT plus independent coding of streams is optimal, not only for minimizing distortion as shown in [26], but also for maximizing the achievable rate of coordinated networks.

V The Multiple-Base Stations Case

Consider now assisted by cooperative BSs. The achievable rate follows (10) where, as previously, the objective function is not concave over , . To make it concave, we change the variables: , , so that:


Again, the feasible set does not define a convex set. Our strategy to solve the optimization is the following: first, we show that the duality gap for the problem is zero. Later, we propose an iterative algorithm that solves the dual problem, thus solving the primal too. An interesting property of the dual problem is that the coupling constraint in (17) is decoupled [23, Chapter 5].

V-a The dual problem

Let the Lagrangian of (17) be defined on , and as:


The dual function for follows [22, Section 5.1]:


The solution of the dual problem is then obtained from

Lemma 1

The duality gap for optimization (17) is zero, i.e., the primal problem (17) and the dual problem (20) have the same solution.


The duality gap for problems of the form of (17), and satisfying the time-sharing property, is zero [28, Theorem 1]. Time-sharing property is defined as follows: let be the solution of (17) for backhaul rates , respectively. Consider for some . Then, the property is satisfied if and only if , . That is, if the solution of (17) is concave with respect to the backhaul rate . It is well known that time-sharing of compressions cannot decrease the resulting distortion [27, Lemma 13.4.1], neither improve the mutual information obtained from the reconstructed vectors. Hence, the property holds for (17), and the duality gap is zero.

We then solve the dual problem in order to obtain the solution of the primal. First, consider maximization (19). As expected, the maximization can not be solved in closed form. However, as the feasible set (i.e., ) is the cartesian product of convex sets, then a block coordinate ascent algorithm333Also known as Non-Linear Gauss-Seidel Algorithm [29, Section II-C]. can be used to search for the maximum [23, Section 2.7]. The algorithm iteratively optimizes the function with respect to one while keeping the others fixed. It has been previously used to e.g., solve the sum-rate problem of MIMO multiple access channels with individual and sum-power constraint [30][31]. We define it for our problem as:


where is the iteration index. As shown in Theorem 3, the maximization (21) is uniquely attained.

Theorem 3

Let the optimization and the conditional covariance matrix (See Appendix A-A)


with eigen-decomposition . The optimization is uniquely attained at , where


See Appendix E-A for the proof.

Function is continuously differentiable, and the maximization (21) is uniquely attained. Hence, the limit point of the sequence is proven to converge to a local maximum [23, Proposition 2.7.1]. To demonstrate convergence to the global maximum, it is necessary to show that the mapping is a block contraction444See [32, Section 3.1.2] for the definition of block-contraction. for some [32, Proposition 3.10]. Unfortunately, we were not able to demonstrate the contraction property on the Lagrangian, although simulation results suggest global convergence of our algorithm always.

Once obtained through the Gauss-Seidel Algorithm555Assume hereafter that the algorithm has converged to the global maximum of ., it remains to minimize it on . First, recall that is a convex function, defined as the pointwise maximum of a family of affine functions [22]. Hence, to minimize it, we may use a subgradient approach as e.g., that proposed by Yu in [31]. The subgradient search consists on following search direction such that


Such a search is proven to converge to the global minimum for diminishing step-size rules [29, Section II-B]. Considering the definition of , the following satisfies (24):


Therefore, it is used to search for the optimum as:


Consider now as the initial value of the Lagrange multiplier. For such a multiplier, the optimum solution of (19) is and the subgradient (25) is (See Appendix E-B). Hence, following (26), the optimum value of is strictly lower than one. Algorithm 1 takes all this into account in order to solve the dual problem, hence solving the primal too. As mentioned, we can only claim convergence of the algorithm to a local maximum.

1:  Initialize and
2:  repeat
4:     Obtain from Algorithm 2
5:     Evaluate as in (25).
6:     if , then , else
7:  until 
Algorithm 1 Multiple-BSs dual problem
1:  Initialize , and
2:  repeat
3:     for  = 1 to  do
4:        Compute from (22).
5:        Take its eigen-decomposition and compute as in (23).
6:        Update .
7:     end for
9:  until The sequence converges
10:  Return
Algorithm 2 Non-linear Gauss-Seidel to obtain

V-B Practical Implementation

In the network, Distributed Wyner-Ziv compression can be practically implemented using a simple Successive Wyner-Ziv (S-WZ) approach [20][33, Theorem 3]. To describe it, let us recall that the optimum compression noises are obtained from Algorithm 1, and let be a given permutation on . For such a permutation, the S-WZ coding is defined as follows:

  • Parallel Compression: compresses its received vector using a single-source Wyner-Ziv code with decoder side information (following Proposition 1), at a compression rate


    The conditional covariance is calculated in (53). In parallel, , compresses its signal using a single-source Wyner-Ziv code with decoder side information , at a rate


    In this case, the conditional covariance can be calculated from (56).

  • Successive Decompression: first recovers the codeword using side information ; later, it successively recovers codewords , , using as side information.

It is easy to check the optimality of the S-WZ coding:

Second equality comes from the Markov chain in Proposition 2, and third from the chain rule for mutual information; The fourth follows from the fact that satisfy the constraint (10) with equality. Unfortunately, transform coding is not (generally) optimum for S-WZ with , since the eigenvectors of , and those of does necessarily match.

Vi The Multiple User Scenario

In previous sections, we considered a single user within the network. To complement the analysis, we study hereafter multiple (i.e., two) senders transmitting simultaneously. The users, and , transmit two independent messages , , mapped onto codewords , , respectively. Codewords are drawn i.i.d. from random vectors , and are not subject to optimization. Hence, now, the BSs receive:


Here, is the MIMO channel between user and , and . As previously, signals at are distributely compressed using a D-WZ code, and later sent to , which centralizes decoding. Using standard arguments, the set of transmission rates , at which messages , can be reliably decoded is [27][14]:


The union in (31) is explained by the fact that compression codebooks might be arbitrary chosen at the BSs. Notice that the boundary points of the region can be achieved using superposition coding (SC) at the users, successive interference cancellation (SIC) at the , and (optionally) time-sharing (TS). Furthermore, as for the single-user case, the optimum conditional distributions , at the boundary of the region can be proven to be Gaussian666Recall that , . We omit the proof due to space limitations.. Therefore, the union in (31) can be restricted to compressed vectors of the form , where . That is:


Where , and , for . Covariance is calculated in Appendix A-B. To evaluate such a region, we resort to the weighted sum-rate (WSR) optimization [34, Sec. III-C]. That is, we express


with the maximum WSR, given weights and