Distributed Compression for the Uplink of a BackhaulConstrained Coordinated Cellular Network
Abstract
We consider a backhaulconstrained coordinated cellular network. That is, a singlefrequency network with multiantenna base stations (BSs) that cooperate in order to decode the users’ data, and that are linked by means of a common lossless backhaul, of limited capacity . To implement receive cooperation, we propose distributed compression: BSs, upon receiving their signals, compress them using a multisource lossy compression code. Then, they send the compressed vectors to a central BS, which performs users’ decoding. Distributed WynerZiv coding is proposed to be used, and is optimally designed in this work. The first part of the paper is devoted to a network with a unique multiantenna user, that transmits a predefined Gaussian spacetime codeword. For such a scenario, the compression codebooks at the BSs are optimized, considering the user’s achievable rate as the performance metric. In particular, for the optimum codebook distribution is derived in closed form, while for an iterative algorithm is devised. The second part of the contribution focusses on the multiuser scenario. For it, the achievable rate region is obtained by means of the optimum compression codebooks for sumrate and weighted sumrate, respectively.
EDICS: WININFO, MSPCAPC, MSPMULT, WINCONT.
I Introduction
Intercell interference is one of the most limiting factors of current cellular networks. It can be partially, but not totally, mitigated resorting to frequencydivision multiplexing, sectorized antennas and fractional frequency reuse [1]. However, a more spectrally efficient solution has been recently proposed: coordinated cellular networks [2]. They consist of singlefrequency networks with base stations (BSs) cooperating in order to transmit to and receive from the mobile terminals. Beamforming mechanisms are thus deployed in the downlink, as well as coherent detection in the uplink, to drastically augment the system capacity[3, 4]. Hereafter, we only focus on the uplink channel.
Preliminary studies on the uplink performance of coordinated networks consider all BSs connected via a lossless backhaul with unlimited capacity [5][6]. Accordingly, the capacity region of the network equals that of a MIMO multiaccess channel, with a suprareceiver containing all the antennas of all cooperative BSs [7]. Such an assumption seems optimistic in shortmid term, as operators are currently worried about the costs of upgrading their backhaul to support e.g., HSPA traffic load. To deal with a realistic backhaul constraint, two approaches have been proposed: i) distributed decoding[8, 9], consisting on a demodulating scheme distributely carried out among BSs, based on local decisions and belief propagation. Decoding delay appears to be its main problem. ii) Quantization [10], where BSs quantize their observations and forward them to decoding unit. Its main limitation relies on its inability to take profit of signal correlation between antennas/BSs; thus, introduces redundancy into the backhaul.
This paper considers a new approach for the network: distributed compression. The cooperative BSs, upon receiving their signals, distributely compress them using a multisource lossy compression code [11]. Then, via the lossless backhaul, they transmit the compressed signals to the central unit (also a BS); which decompresses them using its own received signal as side information, and finally uses them to estimate the users’ messages. Distributed compression has been already proposed for coordinated networks in [12, 13, 14]. However, in those works, authors consider singleantenna BSs with ergodic fading. We extend the analysis here to the multipleantenna case with timeinvariant fading.
The compression of signals with side information at the decoder is introduced by Wyner and Ziv in [15, 16]. They show that side information at the encoder is useless (i.e., the ratedistortion tradeoff remains unchanged) to compress a single, Gaussian, source when it is available at the decoder [16, Section 3]. Unfortunately, when considering multiple (correlated) signals, independently compressed at different BSs, and to be recovered at a central unit with side information, such a statement can not be claimed. Indeed, this is an open problem, for which it is not even clear when sourcechannel separation applies [17]. To the best of authors knowledge, the scheme that performs best (in a ratedistortion sense) for this problem is Distributed WynerZiv (DWZ) compression [18]. Such a compression is the direct extension of BergerTung coding to the decoding side information case [19, 20]. In turn, BergerTung compression can be thought as the lossy counterpart of the SlepianWolf lossless coding [21]. DWZ coding is thus the compresssion scheme proposed to be used, and is detailed in the sequel.
Summary of Contributions. This paper considers a singlefrequency network with multiantenna BSs. The first base station, denoted , is the central unit and centralizes the users’ decoding. The rest, , are cooperative BSs, which distributely compress their received signals using a DWZ code, and independently transmit them to via the common backhaul of aggregate capacity . In the network, timeinvariant, frequencyflat channels are assumed, as well as transmit and receive channel state information (CSI) at the users and BSs, respectively.
The first part of the paper is devoted to a network with a single user, equipped with multiple antennas. It aims at deriving the optimum compression codebooks at the BSs, for which the user’s transmission rate is maximized. Our contributions are the following:

For the single user transmitting a given Gaussian codeword, Sec. III proves that the optimum compression codebooks at the BSs are Gaussian distributed. Accordingly, the compression step is modelled by means of Gaussian ”compression” noise, added by the BSs on their observations before retransmitting them to the central unit.

Considering a unique cooperative BS in the network (i.e., ), Sec. IV derives in closed form the optimum ”compression” noise for which the user’s rate is maximized. We also show that conditional KarhunenLoève transform plus independent WynerZiv coding of scalar streams is optimal.

The compression design is extended in Sec. V to arbitrary BSs. The optimum ”compression” noises (i.e., the optimum codebook distributions) are obtained by means of an iterative algorithm, constructed using dual decomposition theory and a nonlinear block coordinate approach [22, 23]. Due to the nonconvexity of the noises optimization, only local convergence is proven.
The second part of the paper extends the analysis to a network where multiple users transmit simultaneously. For it, the achievable rate region is described resorting to the weighted sumrate optimization:
Notation. denotes expectation. , and stand for the transpose of , conjugate transpose of and complex conjugate of , respectively. . denotes mutual information, entropy. The derivative of a scalar function with respect to a complex matrix is defined as in [24], i.e., . In such a way, e.g., . Moreover, we compactly write , and . A sequence of vectors is compactly denoted by . Furthermore, to define blockdiagonal matrices, we state , with square matrices. stands for convex hull. Finally, the covariance of random vector conditioned on random vector is denoted by and computed
Ii Compression of Vector Sources
The aim of compression within coordinated networks is to make the decoder extract the more mutual information from the reconstructed signals. Known ratedistortion results apply to this goal as follows.
Iia SingleSource Compression with Decoder Side Information
Consider Fig. 1 with . Let be a zeromean, temporally memoryless, Gaussian vector to be compressed at . Assume that it is the observation of the signal transmitted by user , i.e., . compresses the signal and sends it to , which makes use of its side information to decompress it. Finally, once reconstructed the signal into vector , the decoder uses it to estimate the message transmitted by the user. Wyner’s results [16] apply to this problem as follows.
Definition 1 (Singlesource Compression Code)
A compression code with side information at the decoder is defined by two mappings, and and three spaces and , where
Proposition 1 (WynerZiv Coding [16])
Let the random vector with conditional probability satisfy the Markov chain , and let and be jointly Gaussian. Then, considering a sequence of compression codes with side information at the decoder:
(1) 
as if:

the compression rate satisfies
(2) 
the compression codebook consists of random sequences drawn i.i.d. from , where ,

the encoding outputs the binindex of codewords that are jointly typical with the source sequence . In turn, outputs the codeword that, belonging to the bin selected by the encoder, is jointly typical with .
The proposition is proven in [16, Lemma 5] using joint typicality arguments.
IiB MultipleSource Compression with Decoder Side Information
Consider Fig. 1. Let be zeromean, temporally memoryless, Gaussian vectors to be compressed independently at , respectively. Assume that they are the observations at the BSs of the signal transmitted by user , i.e., . The compressed vectors are sent to , which decompresses them using its side information and uses them to estimate the user’s message. Notice that the architecture in Fig. 1 imposes sourcechannel separation at the compression step, which is not shown to be optimal. However, it includes the coding scheme with best known performance: Distributed WynerZiv coding [18]. It applies to the setup as follows.
Definition 2 (Multiplesource Compression Code)
A compression code with side information at the decoder is defined by mappings, , , and , and spaces , and , where
Proposition 2 (Distributed WynerZiv Coding [18])
Let the random vectors , , have conditional probability and satisfy the Markov chain . Let and , be jointly Gaussian. Then, considering a sequence of compression codes with side information at the decoder:
(3) 
as if:

the compression rates satisfy
(4) 
each compression codebook , consists of random sequences drawn i.i.d. from , where .

for every , the encoding outputs the binindex of codewords that are jointly typical with the source sequence . In turn, outputs the codewords , that, belonging to the bins selected by the encoders, are all jointly typical with .
The proposition is proven for discrete sources and discrete side information in [18, Theorem 2]. Also, the extension to the Gaussian case is conjectured therein. The conjecture can be proven by noting that DWZ coding is equivalent to BergerTung coding with side information at the decoder[19]. In turn, BergerTung coding can be implemented through timesharing of successive WynerZiv compressions [20], for which introducing side information at the decoder reduces the compression rate as in (4). Due to space limitations, we limit the proof to this sketch.
Now, we can present the coordinated cellular network with DWZ coding.
Iii System Model
Let a single source , equipped with antennas, transmit data to base stations , each one equipped with antennas. The BSs, as in typical 3G networks, are connected (through radio network controllers) to a common lossless backhaul of aggregate capacity , and is selected to be the decoding unit. This usertoBSs assignment is assumed to be given by upper layers and out of the scope of the paper^{1}^{1}1The derivation of the optimum set of BSs to decode the user is out of the scope of our study. We refer the reader to e.g, [6] for assignment algorithms and selection criteria..
The source transmits a message mapped onto a zeromean, Gaussian codeword , drawn i.i.d. from random vector and not subject to optimization. The transmitted signal, affected by timeinvariant, memoryless fading, is received at the BSs under additive noise:
(5) 
where is the MIMO channel matrix between user and , and is AWGN. Channel coefficients are known at both the BSs and at the user, while has centralized knowledge of all the channels within the network.
Iiia Problem Statement
Base stations , upon receiving their signals, distributely compress them using a DWZ compression code. Later, they transmit the compressed vectors to , which recovers them and uses them to decode. Considering so, the user’s message can be reliably decoded iif [12, Theorem 1]:
Second equality follows from (3) in Prop. 2. However, equality only holds for compression rates satisfying the set of constraints (4). As mentioned, in the backhaul there is only an aggregate rate constraint , i.e., , . Therefore, the set of constraints (4) can be all restated as:
(7) 
Furthermore, from the Markov chain in Prop. 2, the following inequality holds
(8) 
Therefore, forcing the constraint to hold makes all constraints in (7) to hold too. Accordingly, the maximum transmission rate of user is obtained from optimization:
Theorem 1
See Appendix B for the proof.
Remark 1
The maximization above is not concave in standard form: although the feasible set is convex, the objective function is not concave on .
IiiB Useful Upper Bounds
Prior to solving (10), we present two upper bounds on it.
Upper Bound 1
The achievable rate in (10) is upper bounded by
(11) 
Upper Bound 2
The achievable rate in (10) satisfies
(12) 
See Appendix C for the proof.
Remark 2
Notice that, independently of the number of BSs, the achievable rate is bounded above by the capacity with plus the backhaul rate.
Iv The TwoBase Stations Case
We first solve (10) for . As mentioned, the objective function, which has to be maximized, is convex on . In order to make it concave, we change the variables , so that
The objective has turned into concave. However, the constraint now does not define a convex feasible set. Therefore, KarushKuhnTucker (KKT) conditions become necessary^{2}^{2}2Notice that all feasible points are regular. but not sufficient for optimality. To solve the problem, we need to resort to the general sufficiency condition [23, Proposition 3.3.4]: first, we derive a matrix for which the KKT conditions hold. Later, we demonstrate that the selected matrix also satisfies the general sufficiency condition, thus becoming the optimal solution. The optimum compression noise is finally recovered as . This result is presented in Theorem 2:
Theorem 2
Let and the conditional covariance (see Appendix AA):
(14) 
with eigendecomposition . The optimum ”compression” noise at is , with
(15) 
and is such that .
See Appendix D for the proof
Iva Practical Implementation
The optimum compression in Theorem 2 can be carried out using a practical Transform Coding (TC) approach. With TC, first transforms its received vector using an invertible linear function and then separately compresses the resulting scalar streams [25]. We show that the conditional KarhunenLoève transform (CKLT) is an optimal linear transformation [26]. First, let recall that multiplying a vector by a matrix does not change the mutual information [27], i.e., and . From Theorem 2, the optimum compressed vector satisfies , with and . Therefore, the following compressed vectors are also optimal
(16) 
where vector is referred to as the CKLT of vector . Notice now that is diagonal. Therefore, the elements of the compressed vector are conditionally uncorrelated given . Likewise, so are the elements of vector . Due to this uncorrelation, each element of vector can be compressed, without loss of optimality, independently of the compression of the others elements, at a compression rate , [16]. From Theorem 2 we validate that . This demonstrates that CKLT plus independent coding of streams is optimal, not only for minimizing distortion as shown in [26], but also for maximizing the achievable rate of coordinated networks.
V The MultipleBase Stations Case
Consider now assisted by cooperative BSs. The achievable rate follows (10) where, as previously, the objective function is not concave over , . To make it concave, we change the variables: , , so that:
(17)  
Again, the feasible set does not define a convex set. Our strategy to solve the optimization is the following: first, we show that the duality gap for the problem is zero. Later, we propose an iterative algorithm that solves the dual problem, thus solving the primal too. An interesting property of the dual problem is that the coupling constraint in (17) is decoupled [23, Chapter 5].
Va The dual problem
Let the Lagrangian of (17) be defined on , and as:
(18)  
The dual function for follows [22, Section 5.1]:
(19) 
The solution of the dual problem is then obtained from
(20) 
Lemma 1
The duality gap for problems of the form of (17), and satisfying the timesharing property, is zero [28, Theorem 1]. Timesharing property is defined as follows: let be the solution of (17) for backhaul rates , respectively. Consider for some . Then, the property is satisfied if and only if , . That is, if the solution of (17) is concave with respect to the backhaul rate . It is well known that timesharing of compressions cannot decrease the resulting distortion [27, Lemma 13.4.1], neither improve the mutual information obtained from the reconstructed vectors. Hence, the property holds for (17), and the duality gap is zero.
We then solve the dual problem in order to obtain the solution of the primal. First, consider maximization (19). As expected, the maximization can not be solved in closed form. However, as the feasible set (i.e., ) is the cartesian product of convex sets, then a block coordinate ascent algorithm^{3}^{3}3Also known as NonLinear GaussSeidel Algorithm [29, Section IIC]. can be used to search for the maximum [23, Section 2.7]. The algorithm iteratively optimizes the function with respect to one while keeping the others fixed. It has been previously used to e.g., solve the sumrate problem of MIMO multiple access channels with individual and sumpower constraint [30][31]. We define it for our problem as:
(21) 
where is the iteration index. As shown in Theorem 3, the maximization (21) is uniquely attained.
Theorem 3
Let the optimization and the conditional covariance matrix (See Appendix AA)
(22) 
with eigendecomposition . The optimization is uniquely attained at , where
(23) 
See Appendix EA for the proof.
Function is continuously differentiable, and the maximization (21) is uniquely attained. Hence, the limit point of the sequence is proven to converge to a local maximum [23, Proposition 2.7.1]. To demonstrate convergence to the global maximum, it is necessary to show that the mapping is a block contraction^{4}^{4}4See [32, Section 3.1.2] for the definition of blockcontraction. for some [32, Proposition 3.10]. Unfortunately, we were not able to demonstrate the contraction property on the Lagrangian, although simulation results suggest global convergence of our algorithm always.
Once obtained through the GaussSeidel Algorithm^{5}^{5}5Assume hereafter that the algorithm has converged to the global maximum of ., it remains to minimize it on . First, recall that is a convex function, defined as the pointwise maximum of a family of affine functions [22]. Hence, to minimize it, we may use a subgradient approach as e.g., that proposed by Yu in [31]. The subgradient search consists on following search direction such that
(24) 
Such a search is proven to converge to the global minimum for diminishing stepsize rules [29, Section IIB]. Considering the definition of , the following satisfies (24):
(25) 
Therefore, it is used to search for the optimum as:
(26) 
Consider now as the initial value of the Lagrange multiplier. For such a multiplier, the optimum solution of (19) is and the subgradient (25) is (See Appendix EB). Hence, following (26), the optimum value of is strictly lower than one. Algorithm 1 takes all this into account in order to solve the dual problem, hence solving the primal too. As mentioned, we can only claim convergence of the algorithm to a local maximum.
VB Practical Implementation
In the network, Distributed WynerZiv compression can be practically implemented using a simple Successive WynerZiv (SWZ) approach [20][33, Theorem 3]. To describe it, let us recall that the optimum compression noises are obtained from Algorithm 1, and let be a given permutation on . For such a permutation, the SWZ coding is defined as follows:

Parallel Compression: compresses its received vector using a singlesource WynerZiv code with decoder side information (following Proposition 1), at a compression rate
(27) The conditional covariance is calculated in (53). In parallel, , compresses its signal using a singlesource WynerZiv code with decoder side information , at a rate
(28) In this case, the conditional covariance can be calculated from (56).

Successive Decompression: first recovers the codeword using side information ; later, it successively recovers codewords , , using as side information.
It is easy to check the optimality of the SWZ coding:
Second equality comes from the Markov chain in Proposition 2, and third from the chain rule for mutual information; The fourth follows from the fact that satisfy the constraint (10) with equality. Unfortunately, transform coding is not (generally) optimum for SWZ with , since the eigenvectors of , and those of does necessarily match.
Vi The Multiple User Scenario
In previous sections, we considered a single user within the network. To complement the analysis, we study hereafter multiple (i.e., two) senders transmitting simultaneously. The users, and , transmit two independent messages , , mapped onto codewords , , respectively. Codewords are drawn i.i.d. from random vectors , and are not subject to optimization. Hence, now, the BSs receive:
(30) 
Here, is the MIMO channel between user and , and . As previously, signals at are distributely compressed using a DWZ code, and later sent to , which centralizes decoding. Using standard arguments, the set of transmission rates , at which messages , can be reliably decoded is [27][14]:
(31) 
The union in (31) is explained by the fact that compression codebooks might be arbitrary chosen at the BSs. Notice that the boundary points of the region can be achieved using superposition coding (SC) at the users, successive interference cancellation (SIC) at the , and (optionally) timesharing (TS). Furthermore, as for the singleuser case, the optimum conditional distributions , at the boundary of the region can be proven to be Gaussian^{6}^{6}6Recall that , . We omit the proof due to space limitations.. Therefore, the union in (31) can be restricted to compressed vectors of the form , where . That is:
(32) 