The Finite State MAC with Cooperative Encoders and Delayed CSI

# The Finite State MAC with Cooperative Encoders and Delayed CSI

Ziv Goldfeld, Haim H. Permuter and Benjamin M. Zaidel Manuscript received March 31, 2013; revised December 15, 2013; accepted July 18, 2014. The work was supported by the European Research Council (ERC) starting grant, ISF grant no. 684/11 and the IMOD. This paper was presented in part at the IEEE International Symposium on Information Theory 2012, Cambridge, MA, USA, July, 2012, and in part at the 2012 IEEE 27-th Convention of Electrical and Electronics Engineers in Israel, Eilat, Israel, November, 2012. Ziv Goldfeld and Haim Permuter are with the department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel (e-mails: zgzg1984@gmail.com, haimp@bgu.ac.il). Benjamin M. Zaidel is an independent researcher (e-mail: benjamin.zaidel@gmail.com).
###### Abstract

In this paper, we consider the finite-state multiple access channel (MAC) with partially cooperative encoders and delayed channel state information (CSI). Here partial cooperation refers to the communication between the encoders via finite-capacity links. The channel states are assumed to be governed by a Markov process. Full CSI is assumed at the receiver, while at the transmitters, only delayed CSI is available. The capacity region of this channel model is derived by first solving the case of the finite-state MAC with a common message. Achievability for the latter case is established using the notion of strategies, however, we show that optimal codes can be constructed directly over the input alphabet. This results in a single codebook construction that is then leveraged to apply simultaneous joint decoding. Simultaneous decoding is crucial here because it circumvents the need to rely on the capacity region’s corner points, a task that becomes increasingly cumbersome with the growth in the number of messages to be sent. The common message result is then used to derive the capacity region for the case with partially cooperating encoders. Next, we apply this general result to the special case of the Gaussian vector MAC with diagonal channel transfer matrices, which is suitable for modeling, e.g., orthogonal frequency division multiplexing (OFDM)-based communication systems. The capacity region of the Gaussian channel is presented in terms of a convex optimization problem that can be solved efficiently using numerical tools. The region is derived by first presenting an outer bound on the general capacity region and then suggesting a specific input distribution that achieves this bound. Finally, numerical results are provided that give valuable insight into the practical implications of optimally using conferencing to maximize the transmission rates.

{keywords}

Capacity region, Common message, Convex optimization, Cooperative encoders, Delayed CSI, Diagonal vector Gaussian Multiple-access channel, Finite-state channel, Multiple-access channel, Simultaneous decoding, Strategy letters.

## I Introduction

Temporal variations, a characteristic typical of wireless channels, may occur due to atmospheric changes, changes in the environment, the mobility of transmitters and/or receivers or time-varying intentional or unintentional interference. Since accurate channel state information (CSI) at both the transmitting and the receiving ends is crucial for efficient communications, measures are commonly incorporated in the communication protocol to enable channel state estimation. For example, the long term evolution (LTE) cellular communication standard relies on pilot signals transmitted at pre-scheduled time intervals and frequency slots to estimate the channel’s state [1]. Performed at the receiver, these estimations are then typically fed back to the transmitter, but obtaining perfect CSI at both ends of the channel in practical systems is a formidable challenge. More often than not, CSI is subject to channel estimation errors and feedback is not instantaneous due to some inevitable processing delay, and as a result, receivers and transmitters typically have access to only partial CSI. The impact of such partial CSI on the achievable performance, therefore, has attracted much attention in recent years. In the case of multiuser communication, performance is affected not only by channel characteristics, but also by interactions between the users. In particular, different forms of cooperation between the transmitting and receiving ends, a subject of growing interest in recent years (e.g., [2, 3]), may significantly enhance performance. This paper aims to investigate the combined impact of both partial CSI and cooperation. More specifically, we focus on a two-user finite state Markov multiple access channel (FSM-MAC), with partially cooperative encoders and delayed CSI, as illustrated in Fig. 1 and explained in the following text.

In the communication scenario under discussion, each of the two encoders wishes to send an independent private message through a time-varying MAC to the decoder. Delayed CSI is assumed to be available at the encoders, while full delayless CSI is assumed at the decoder. Different users may be subject to different CSI delays. It is further assumed that prior to each transmission block, the two encoders are allowed to hold a conference. More specifically, it is assumed that the encoders can communicate with each other over noise-free communication links of given capacities. We restrict the discussion to the case in which the conference held between the encoders is independent of the CSI.

The non-state-dependent MAC with partially cooperative encoders was first introduced by Willems [4], who also derived the capacity region for the discrete memoryless setting. Special cases of this channel model include that in which the encoders are ignorant of each other’s messages (i.e., the capacities of the communication links between them are both zero) and that in which the encoders fully cooperate (i.e., the capacities of the communication links are infinite). The first setting, where no conference is held, corresponds to the classical MAC, for which the capacity region was determined by Ahlswede [5] and Liao [6]. In contrast, in the second setting, where total cooperation is available, the encoders can act as one by fully sharing their private messages via the conference. The capacity region for this case is the part of the first quadrant below the so-called total cooperation line. This triangle-shaped region always contains the capacity region for the classical MAC.

In his proof of achievability for the conferencing MAC, Willems [4] introduced a coding scheme based on the capacity region for the MAC with a common message, derived by Slepian and Wolf in [7]. Willems showed that in order to achieve the capacity region, the encoders should use the cooperation link to share parts of their private messages and then use a coding scheme for the ordinary MAC with a common message. Although Willems’s model allows interactive communication between the encoders, it was shown both in [4] and later in [8] that a single round of communication between the encoders (referred to as a “pair of simultaneous monologues” in [4]) suffices to achieve optimality.

Additional multiuser settings that involve cooperation between users through communication links of finite capacities have been extensively treated in the literature. See, for example, [9] and [10] for studies of the MAC, [2] and [11, 12, 13, 14, 15, 16] for studies of the interference channel with cooperating nodes, [17] for the broadcast channel, [18] and [19] for cooperative relaying and [3] and [20] and references therein for cooperation in cellular architectures. A comprehensive survey of cooperation and its role in communication can be found in [21]. It is important to note, however, that in all of the above settings the channel was not assumed to be time-varying.

Multiuser settings that combine both time-varying channels and user cooperation are obviously of major interest as well. A Gaussian fading MAC with cooperating encoders that have access to delayless CSI was considered in [22] and in [23]. As in our case, these works assume that cooperation is allowed only before the CSI becomes available at the encoders. The case in which the CSI becomes available to the encoders prior to transmission is treated in [24], where a MAC with perfect noncausal CSI is considered. The coding scheme introduced in [24] uses conferencing to share parts of the messages as well as CSI.

The notion of modeling time-varying channels as state-dependent channels dates back to Shannon [25], who characterized the capacity of the state-dependent, memoryless point-to-point channel with independent and identically distributed (i.i.d.) states available causally at the encoder. To establish achievability, Shannon presented a code construction that relied on “strategies” (or “strategy letters”) [26], a notion we also exploit in this paper. Gelfand and Pinsker [27], and later Heegard and El Gamal [28], studied the case in which the encoder observes the channel states noncausally. In both [27] and [28] a single letter expression for the capacity is derived using random binning. In [29], Goldsmith and Varaiya considered a fading channel with perfect CSI at both the transmitter and the receiver. It was shown that in such a case, the optimal strategy is to employ waterfilling over time.

As was already stated, because perfect CSI is difficult to obtain in practical systems, models that involve partial or imperfect CSI have attracted a lot of attention in recent years. At first, different settings involving an i.i.d. state sequence with imperfect CSI were treated. Initially, various point-to-point channel scenarios with partial CSI were studied. Among others, the causal, noncausal, rate-limited and noisy cases were addressed [30, 31, 32]. Extension of the result to the MAC with rate-limited CSI can be found in [33]. In [34], the authors derive the capacity region for the MAC with asymmetric quantized CSI at the encoders, where the quantization models the imperfection in the channel state estimation (full CSI at the decoder is assumed). Later, in [35] Lapidoth and Steinberg provided an inner bound for the capacity region of the MAC with strictly causal CSI at the encoders. In contrast to the point-to-point setting, where strictly causal CSI regarding an i.i.d. state sequence does not increase capacity, the capacity region of the MAC with causal CSI is strictly larger than the corresponding region without CSI. Li et al. presented an improved inner bound for the same setting in [36]. A comprehensive monograph on channel coding in the presence of side information can be found in [37], where an i.i.d. state sequence is assumed. An information theoretic model for a single user channel involving delayed CSI and a state process that is no longer restricted to be memoryless and i.i.d. was first introduced by Viaswanathan [38], who derived the capacity while assuming a FSM channel. This result was later generalized by Caire and Shamai in [26], where they addressed a point-to-point channel in which the CSIs at both encoder and decoder admit some general joint probability law. A general capacity formula, which relies on the notion of inf-information rate [39], is then provided for the case of state processes with memory. The result is then shown to boil down to a single-letter characterization in the case in which perfect CSI is available to the receiver, the CSI at the transmitter is given by a deterministic function of the channel state, and the two processes are jointly stationary and ergodic. By an appropriate choice of the above deterministic function, the result for Viswanathan’s delayed CSI model [38] is obtained as a special case of the result in [26]. A generalization of the point-to-point results of [26] to the MAC was presented by Das and Narayan in [40]. The generality of the channel model therein leads to multiletter characterization of the capacity region in various settings, which unfortunately provides limited insight into practical encoding schemes for channel models in this framework.

Taking a practically oriented approach, we focus in this paper on a specific channel definition that leads to single-letter results. Following [38], we model temporal variations by means of a FSM channel [41, 42]. The channel state is determined on a per symbol basis and governed by the underlying FSM process. An important extension of this idea to the multiuser case was introduced by Basher et al. in [43], presenting the FSM-MAC with delayed CSI and non-cooperating encoders, i.e., where no conference is held (see also [44] for a related source coding analysis). In the proof of the capacity region for this model, achievability was established by employing a coding scheme based on rate-splitting and multiplexing-coding combined with successive decoding at the receiver. Successive decoding was used in [43] to demonstrate that the two corner points of the capacity region are achievable. The whole capacity region is then achievable via time-sharing. Although the setting in [43] constitutes a special case of the general model in [40], the main contribution of [43] is the single-letter characterization of the capacity region and the detailed construction of the coding scheme.

In the current paper, accounting for the availability of a conferencing link between the encoders, we take a different approach than that taken in [43]. We base the proof of achievability on the coding scheme for the MAC with a common message as presented in [4], and therefore, we start by deriving the capacity region for the FSM-MAC with a common message and the same CSI properties as in [43]. We thus provide a solution to what has been, until now, an unsolved problem. Next, using the achievable scheme for the common message setting, the achievability of the conferencing region is established. We note that the large number of corner points induced by the presence of an additional transmission rate (namely, the rate of the common message) render the provision of an achievable coding scheme for the common message setting based on achieving the region’s corner points an awkward task. Moreover, the use of rate-splitting and multiplexing-coding when a common message is involved yields a rather complex coding scheme which we sought to avoid.

Therefore, we present an alternative coding scheme that employs strategy letters in the code construction (cf., e.g., [25, 26] and [40]) and simultaneous decoding. However, unlike the case of Shannon’s classical result for the point-to-point channel with causal encoder CSI, here we show that optimal codes can be constructed directly over the input alphabet (as also shown for certain special cases in [40]). Namely, a single codebook is generated for each of the three messages over a super-alphabet that corresponds to the different realizations of the delayed CSI available at the encoders. At each time instance, a symbol that is correlated with the current available delayed CSI is selected by the encoders and transmitted to the channel. Thus, in contrast to previous works involving delayed CSI (cf., [38] and [43]), here rate-splitting is no longer required. The decoder then uses its access to full CSI (which deterministically defines the delayed state sequences as well) to reduce each codeword (originally constructed over a super-alphabet) to a sequence over the input alphabet and executes a simultaneous decoding scheme based on joint typicality. Indeed, one of the most signiï¬cant contributions of our paper is this coding scheme for the MAC with a common message and delayed CSI. Not only does it successfully avoid the unnecessary complexity of its rate-splitting and multiplexing counterpart and relies on a simpler codebook construction, it also achieves every possible point in the region rather than only the corner points. Furthermore, this two-user coding scheme is easily extendable to the case of multiple users with a single common message.

Based on the general results for the FSM-MAC with conferencing, we continue with the derivation of the capacity region for the special case of a vector Gaussian FSM-MAC with diagonal channel transfer matrices. This channel model can be used to represent an orthogonal frequency-division multiplexing (OFDM)-based communication system, employing single receive and transmit antennas, where the diagonal entries of the channel matrices represent the orthogonal sub-channels used by the OFDM scheme.

To derive the capacity region for the latter channel, we use a multivariate extension of a novel tool first derived in [45] (namely, a necessary and sufficient condition for a Gaussian triplet of random variables to satisfy a certain Markov relation), and demonstrate that Gaussian multivariate distributions maximize certain mutual information expressions under a Markovity constraint. The scalar version of this tool was employed by Lapidoth et al. [46] to provide an outer bound for the capacity region of the scalar Gaussian non-state-dependent MAC with conferencing encoders. Wigger and Kramer also used this tool in their solution for the capacity region of the three-user, non-state-dependent MIMO MAC with conferencing [47]. The need to use the tool from [45] stems from the fact that the input distribution of the conferencing channel must admit a certain Markovity constraint. For cases in which no Markov relation needs to be satisfied, the traditional approach to proving the optimality of Gaussian multivariate distributions involves employing either the Vector Max-Entropy Theorem (a direct extension of [48, Theorem 12.1.1]) or a conditional version of it. Here, however, this approach fails since replacing a non-Gaussian vector satisfying the Markovity condition by a Gaussian vector of the same covariance matrix may result in a Gaussian vector that violates the Markovity condition. To overcome this issue we use a sufficient and necessary condition on the (auto- and cross-) covariance matrices of the involved Gaussian random vectors for them to admit a Markov relation [49, Section 2, Theorem 1].

We note that although Gaussian input vectors are shown to be optimal in this setting, the original form of the capacity region involves a non-convex optimization problem. To circumvent this difficulty, new variables are introduced to convert the optimization problem into a convex problem that can then be solved using numerical tools such as CVX [50]. The capacity region for the corresponding scalar Gaussian channel can be immediately derived from the result for the vector channel setting and serves as an extension of the result in [46] to the state-dependent case. The capacity region of the vector Gaussian FSM-MAC with a common message and the same CSI properties can also be easily derived from the result for the conferencing channel by exploiting the strong correspondence between the two models and using a simple analogy.

To gain some insight into the practical implications of the results we conclude this paper with a specific example, namely, a scalar AWGN channel with two possible states (‘Good’ and ‘Bad’). Numerical results are included to demonstrate the impact of different channel parameters on the capacity region and the optimal input distribution. Our interpretation of interactions between the different parameters produces valuable insights.

The remainder of the paper is organized as follows. In Section II we describe the two communication models of interest – the FSM-MAC with a common message and delayed CSI and the FSM-MAC with partially cooperative encoders and delayed CSI. In Sections III and IV, we state the capacity results for the common message and conferencing models, respectively. Each result is followed by its proof. Section V follows with the definition of the vector Gaussian FSM-MAC with diagonal channel transfer matrices and the derivation of the maximization problem defining its capacity region. The regions for the corresponding common message model and the scalar setting are given as special cases. The two-state Gaussian example is discussed in this section as well. Finally, Section VI summarizes the main achievements and insights presented in this paper along with some possible future research directions and extensions.

## Ii Channel Models and Notation

In this paper, we investigate the capacity region of the FSM-MAC with partially cooperative encoders, full CSI at the decoder (receiver) and delayed CSI at the encoders (transmitters), as illustrated in Fig. 1. To this end, we first consider a different setting, which is the FSM-MAC with a common message and the same CSI properties, as depicted in Fig. 2. The derivation of the capacity region for the latter common message setting forms the basis for the achievability proof for the former setting where a conferencing link exists between the encoders. Since most definitions for both channels follow similar lines, we start by defining the common message setting and then extend the description for the setting of partially cooperative encoders.

We use the following notations. Matrices are denoted by nonitalicized capital letters, e.g., . Calligraphic letters denote sets, e.g., , while the cardinality of a set is denoted by . stands for the -fold Cartesian product of . An element of is denoted by , and its substrings as ; when , the subscript is omitted. We use the notation . Whenever the dimension is clear from the context, vectors (or sequences) are denoted by boldface letters, e.g., . Random variables are denoted by uppercase letter, e.g., , with similar conventions for random vectors. stands for the sequence of random variables , while stands for . The probability of an event is denoted by , while denotes conditional probability of given . Probability mass functions (PMFs) are denoted by the capital letter with a subscript that identifies the random variable and its possible conditioning. For example, for two jointly distributed random variables and , let , , and denote, respectively, the PMF of , the joint PMF of , and the conditional PMF of given . In particular, when and are discrete, represents the stochastic matrix whose elements are given by . We omit the subscripts if the arguments of the distribution are lower case versions of the random variables.

### Ii-a FSM-MAC with a Common Message and Delayed CSI

The FSM-MAC with a common message considered in this paper is illustrated in Fig. 2. The MAC setting consists of two senders and one receiver. Each sender chooses a pair of indices, , uniformly from the set , where denotes the common message and , , denotes the private message of the corresponding sender. The choices of , and are independent. The input to the channel from encoder is denoted by , and the output of the channel is denoted by .

At each instance of time, the FSM channel is assumed to be in one of a finite number of states . In each state, the channel is a discrete memoryless channel (DMC), with input alphabets and output alphabet . Let the random variable denote the channel state at time . Similarly, we denote by and the inputs and the output of the channel at time . The channel transition probability distribution at time depends on the state and the inputs at time , and it is given by . The channel output at any time is assumed to depend only on the channel inputs and state at time . Hence,

 P(yi|xi1,xi2,si)=P(yi|x1,i,x2,i,si). (1)

The state process, , is assumed to be an irreducible, aperiodic, finite-state, homogeneous and stationary Markov chain and is therefore ergodic. The state process is independent of the channel inputs and output when conditioned on the previous states, i.e.,

 P(si|si−1,xi−11,xi−12,yi−1)=P(si|si−1). (2)

Furthermore, we assume that the state process is independent of the messages , and , i.e.,

 P(sn,m0,m1,m2)=n∏i=1P(si|si−1)P(m0)P(m1)P(m2). (3)

We assume that full CSI is available at the decoder (i.e., the decoder knows at each time instance ). However, the encoders are only assumed to have access to delayed CSI, with delays and for Encoder 1 and Encoder 2, respectively. We let , , denote the channel state at time , and assume without loss of generality that . Now, let be the one-step state-transition probability matrix of the Markov process that governs the channel states, and let be its steady state probability distribution. The joint distribution of is stationary and is given by

 πd(Si=sl,Si−d=sj)=π(sj)Kd(sl,sj), (4)

where is the -th element of the d-step transition probability matrix of the Markov state process. To simplify the notation, we define the joint distribution of the random variables as the joint distribution of , i.e.,

 PS,~S1,~S2(sl,sj,sv)=π(sj)Kd1−d2(sv,sj)Kd2(sl,sv), (5)

where

###### Definition 1 (Code Description).

code for the FSM-MAC with CSI at the decoder and delayed CSI at the encoders with delays and consists of:

1. Three sets of integers , and , referred to as the message sets.

2. Two encoding functions , . Each function is defined by means of a sequence of functions , , that depend only on the pair of messages , and the channel states up to time . The output of Encoder at time , , is given by

 Xj,i={fj,i(M0,Mj),1≤i≤djfj,i(M0,Mj,Si−dj),dj+1≤i≤n. (6)
3. A decoding function:

 ψ:Yn×Sn→M0×M1×M2 . (7)

The average probability of error for the code is given in (8) at the bottom of the page. We use standard definitions of achievability and of the capacity region [48]. Namely, a rate triplet is achievable for the FSM-MAC if there exists a sequence of codes with as . The capacity region is the closure of the set of achievable rates .

### Ii-B FSM-MAC with Partially Cooperative Encoders and Delayed CSI

The FSM-MAC with partially cooperative encoders and delayed CSI is depicted in Fig. 1. The channel definition relies on Subsection II-A, while taking the common message set to be . Here, however, conferencing between the encoders is introduced under the assumption that conferencing links of fixed and finite capacities and exist between the encoders. Accordingly, the amount of information exchanged between the encoders during the conference is bounded by and . The conference is assumed to take place prior to the transmission of a codeword through the channel and consists of consecutive pairs of communications, simultaneously transmitted by the encoders. Each communication depends on the message to be transmitted by the sending encoder and previously received communications from the other encoder. We denote the communications transmitted from encoder to the other encoder by . Note that here the state process is also assumed to be independent of the conference communications, i.e.,

 P(sn,vℓ1,vℓ2)=P(sn)P(vℓ1,vℓ2)=n∏i=1P(si|si−1)P(vℓ1,vℓ2). (9)
###### Definition 2 (Code Description).

A code for the FSM-MAC with CSI at the decoder, delayed CSI at the encoders with delays and , and conferencing links with capacities and consists of:

1. Two sets of integers and , referred to as the message sets.

2. Two encoders, where each encoder is completely described by an encoding function, , and a set of () communication functions, , (similar definitions were also used in [4]).

3. The encoding function, , maps the message , , and what was learned from the conference with the other encoder into channel codewords of length . Each function is defined by means of a sequence of functions that depend only on the message , the received communications from the other encoder in the conferencing stage, and the channel states up to time . We emphasize that since encoding occurs only after the conferencing stage has finished, each depends on all received communications.

4. Each of the two communication functions and , , maps the message (or , respectively) and the sequence of previously received communications from the other encoder (or , respectively), onto the -th communication (or , respectively). More specifically, the communications are defined as:

 V1,i=h1,i(M1,Vi−12) ; V2,i=h2,i(M2,Vi−11). (10)
5. The encoding function for Encoder satisfies

 X1,i={f1,i(M1,Vℓ2),1≤i≤d1f1,i(M1,Vℓ2,Si−d1),d1+1≤i≤n, (11)

and the encoding function for Encoder is defined analogously (using the private message , the communications and the delay ).

6. The random variable , for and ranges over the finite alphabet . A conference is -permissible if the sets of communication functions are such that [4]:

 ℓ∑i=1log|V1,i|≤nC12 ; ℓ∑i=1log|V2,i|≤nC21. (12)
7. A decoding function:

 ψ:Yn×Sn→M1×M2. (13)

The average probability of error for the code is given by (14) at the bottom of the page. The achievable rates and the capacity region for this channel are defined analogously to their definitions in Section II-A.

## Iii The Capacity Region of the FSM-MAC with a Common Message and Delayed Transmitter CSI

In this section we state the capacity region of the FSM-MAC with a common message and delayed transmitter CSI, after which we present its proof.

###### Theorem 1.

The capacity region of the FSM-MAC with a common message, CSI at the decoder and asymmetrically delayed CSI at the encoders with delays and , such that , is the union of all sets of rate triplets satisfying:

 R1 ≤I(X1;Y|X2,U,S,~S1,~S2) (15a) R2 ≤I(X2;Y|X1,U,S,~S1,~S2) (15b) R1+R2 ≤I(X1,X2;Y|U,S,~S1,~S2) (15c) R0+R1+R2 ≤I(X1,X2;Y|S,~S1,~S2), (15d)

where the union is over all joint distributions . The joint distribution of is specified in (5) and . Furthermore, the capacity region is convex.

### Iii-a Converse

We need to show that for every achievable rate triplet , there exists a joint distribution such that the inequalities in (15) are satisfied. Since is an achievable rate triplet, there exists a code with a probability of error that becomes arbitrarily small with the increase of the block length (see (8)). By Fano’s inequality,

 H(M0,M1,M2|Yn,Sn) ≤n(R0+R1+R2)P(n)e+H(P(n)e) ≜nϵn, (16)

where clearly as . It therefore follows that

 H(M1|Yn,Sn) ≤H(M0,M1,M2|Yn,Sn)≤nϵn (17) H(M2|Yn,Sn) ≤H(M0,M1,M2|Yn,Sn)≤nϵn (18) H(M1,M2|Yn,Sn) ≤H(M0,M1,M2|Yn,Sn)≤nϵn. (19)

For the sake of brevity we focus here on the upper bound on , while noting that all other upper bounds in (15) can be analogously derived using the same auxiliary random variable definition. It now follows that

 nR 1=H(M1) \lx@stackrel(a)≤I(M1;Yn,Sn)+nϵn \lx@stackrel(b)=I(M1;Yn|Sn)+nϵn \lx@stackrel(d)=n∑i=1I(M1;Yi|Sn,M0,M2,Yi−1)+nϵn \lx@stackrel(e)=n∑i=1[H(Yi|Sn,Xn2,M0,M2,Yi−1) −H(Yi|Sn,Xn1,Xn2,M0,M1,M2,Yi−1)]+nϵn \lx@stackrel(f)≤n∑i=1[H(Yi|X2,i,Si,Si−d1,Si−d2,M0,Si−d1−1) −H(Yi|X1,i,X2,i,Si,Si−d1,Si−d2,M0,Si−d1−1)]+nϵn \lx@stackrel(g)=n∑i=1I(X1,i;Yi|X2,i,Si,Si−d1,Si−d2,Ui)+nϵn (20)

where:
(a) follows from (17);
(b) follows because and are independent;
(c) follows because and are independent given (first term) and since conditioning reduces entropy (second term);
(d) follows by the mutual information chain rule;
(e) follows because is a deterministic function of and is a deterministic function of ;
(f) follows since conditioning reduces entropy (first term), and because when conditioned on and , the channel output at time is independent of (second term);
(g) follows by defining .

Note that the definition of the auxiliary random variable represents the common message and the common knowledge of the state sequence at time (except for ), which, in fact, encompasses all common information shared by the two encoders at this instant of time. We can therefore conclude that the rate must satisfy the following upper bound:

 R1≤1nn∑i=1I(X1,i;Yi|X2,i,Si,Si−d1,Si−d2,Ui)+ϵn. (21)

In a completely analogous manner it can be shown that

 R2≤1nn∑i=1I(X2,i;Yi|X1,i,Si,Si−d1,Si−d2,Ui)+ϵn (22) R1+R2≤1nn∑i=1I(X1,i,X2,i;Yi|Si,Si−d1,Si−d2,Ui)+ϵn (23) R0+R1+R2≤1nn∑i=1I(X1,i,X2,i;Yi|Si,Si−d1,Si−d2)+ϵn. (24)

The upper bounds in (21)-(24) can also be rewritten by introducing a new time sharing random variable that is uniformly distributed over the set . For example, the upper bound in (21) can be rewritten as

 R1 =I(YQ;X1,Q|X2,Q,SQ,SQ−d1,SQ−d2,UQ,Q)+ϵn. (25)

By rewriting the rate bounds (22)-(24) in the same manner as (21) is rewritten into (25), it is clear that the obtained region is convex. This follows directly by the presence of the time sharing random variable in the conditioning of all the mutual information terms.

Next, by denoting and , we get:

 R1 ≤I(X1;Y|X2,U,S,~S1,~S2)+ϵn (26a) R2 ≤I(X2;Y|X1,U,S,~S1,~S2)+ϵn (26b) R1+R2 ≤I(X1,X2;Y|U,S,~S1,~S2)+ϵn (26c) R0+R1+R2 ≤I(X1,X2;Y|S,~S1,~S2)+ϵn, (26d)

where (26d) holds due to

 R0 \lx@stackrel(a)≤I(X1,Q,X2,Q,UQ;YQ|SQ,SQ−d1,SQ−d2,Q)+ϵn \lx@stackrel(b)≤I(X1,Q,X2,Q,UQ,Q;YQ|SQ,SQ−d1,SQ−d2)+ϵn \lx@stackrel(c)≤I(X1,X2,U;Y|S,~S1,~S2)+ϵn \lx@stackrel(d)=I(X1,X2;Y|S,~S1,~S2)+ϵn. (27)

Here:
(a) and (b) follow from the fact that conditioning reduces entropy;
(c) follows from the definition of ;
(d) follows from the Markov relation , which is induced from the channel model.
Taking the limit as , one obtains the bounds as in (15).

To complete the proof of the converse, it is left to show that the following Markov relations hold:

 U−~S1−(S,~S2) (28a) X1−(~S1,U)−(S,~S2) (28b) X2−(~S1,~S2,U)−(X1,S). (28c)

The proof of (28) is given in Appendix A.

### Iii-B Achievability

To establish achievability, we need to show that for a fixed , a fixed distribution

 PU|~S1PX1|U,~S1PX2|U,~S1,~S2, (29)

and rates that satisfy the inequalities in (15), there exists a sequence of codes such that as .

Without loss of generality, we assume that the finite-state space is the set . By the underlying assumptions of the channel model, we take the delays to be fixed and finite integers. Moreover, throughout this proof we use the following notations. For an arbitrary finite set , we denote by a column vector of size with elements . As stated in Section II, sequences of length are denoted by bold lowercase letters, while random sequences are denoted by bold uppercase letters. Consider now the following encoding and decoding scheme.

#### Iii-B1 Codebook Generation

Generate a common message codebook that comprises codewords , , assembled from symbols from the super-alphabet , which are drawn in an i.i.d. manner. Each codeword is distributed according to the product distribution

 P[T0=t0]=n∏i=1PT0(t0,i), (30)

where , for (each can thus be treated as a column vector of size with elements in ordered by the natural order of the set ), and

 PT0(t0)=P[T0=t0]=∏~s1∈SPU|~S1=~s1(u~s1|~s1), (31)

where . Each codeword can hence be viewed as a matrix of dimension with elements in , where each row is associated with a different (delayed) state . Accordingly, we denote by the th element of the th symbol of the codeword .