A New and More General Capacity Theorem for the Gaussian Channel with Two-sided Input-Noise Dependent State Information

# A New and More General Capacity Theorem for the Gaussian Channel with Two-sided Input-Noise Dependent State Information

## Abstract

In this paper, a new and general version of Gaussian channel in presence of two-sided state information correlated to the channel input and noise is considered. Determining a general achievable rate for the channel and obtaining the capacity in a non-limiting case, we try to analyze and solve the Gaussian version of the Cover-Chiang theorem -as an open problem- mathematically and information-theoretically. Our capacity theorem, while including all previous theorems as its special cases, explains situations that can not be analyzed by them; for example, the effect of the correlation between the side information and the channel input on the capacity of the channel that can not be analyzed with Costa’s “writing on dirty paper” theorem. Meanwhile, we try to introduce our new idea, i.e., describing the concept of “cognition” of a communicating object (transmitter, receiver, relay and so on) on some variable (channel noise, interference and so on) with the information-theoretic concept of “side information” correlated to that variable and known by the object. According to our theorem, the channel capacity is an increasing function of the mutual information of the side information and the channel noise. Therefore our channel and its capacity theorem exemplify the “cognition” of the transmitter and receiver on the channel noise based on the new description. Our capacity theorem has interesting interpretations originated from this new idea.

\IEEEpeerreviewmaketitle{IEEEkeywords}

Gaussian channel capacity, correlated side information, two sided state information, transmitter cognition, receiver cognition.

## 1 Introduction

Side information channel has been actively studied since its initiation by Shannon [1]. Coding for computer memories with defective cells was studied by Kusnetsov-Tsybakov [2]. Gel’fand-Pinsker (GP) [3] determined the capacity of channels with channel side information (CSI) known non-causally at the transmitter. Heegard-El Gamal [4] obtained the capacity when the CSI is known only at the receiver. Cover-Chiang [5] extended these results to a general case where correlated two-sided state information are available at the transmitter and at the receiver. Costa [6] obtained an interesting result by carefully investigating the GP theorem for the Gaussian channel, i.e., he proved that the capacity of the Gaussian channel with an interference known at the transmitter is the same as the capacity of interference free channels. There are many other important researches in the literature, e.g.[7, 8, 9]. The results for the single user channel have been generalized possibly to multi user channels, at least in special cases [10, 11, 12, 13, 14, 15].

### Our Motivations

In this paper, we focus on the Gaussian channel in presence of side information for two major aims: First, analyzing the problem of capacity of the Gaussian channel in presence of two sided state information -the Gaussian version of Cover-Chiang theorem [5], mathematically and information-theoretically. Second we try to present an information-theoretical description of the concept of “cognition” of the transmitter and or receiver in an improved manner.

#### First motivation

In this paper, we try to analyze the Gaussian version of the Cover-Chaing unifying theorem [5]. The problem of the effect of side information at the transmitter in a Gaussian channel, in a special case, first, has been studied in Costa’s ”writing on dirty paper” [6]. Let us consider a Gaussian channel with side information known non-causally at the transmitter as depicted in Fig. 1. We denote the side information at the transmitter, the channel input, the channel output, the channel noise and the auxiliary random variable at the transmitter by , , , and , respectively. Moreover, it is assumed that and are Gaussian random variables with powers and respectively and has the power constraint .

Costa [6] shows that the capacity of this channel is surprisingly the same as the capacity of the channel without side information. An important assumption in Costa theorem is that in the definition of the channel, there is no restriction for the correlation between and . However, Costa shows that the maximum rate is obtained when and are independent and is a linear function of and . Hence, his theorem is only applicable to cases where and have the chance to be uncorrelated. Therefore a theorem which can handle the capacity of Gaussian channels when there exists a specific correlation between and is theoretically and practically important. One example for correlated input and side information is cognitive interference channels in which the transmitted sequence of one transmitter is a known interference for the other transmitter and these two sequences may be dependent to each other. Another example is a measurement system where the measuring signal may affect the system under measurement. This is equivalent to an interfering signal which is dependent on the original measuring signal.

Another related question is about the side information known non-causally at the receiver (if exists as in Fig. 2). The question now arises is that: How does the receiver knowledge , correlated to affect the channel capacity? And how much does the receiver information about and , available through , change the channel capacity?

Some communication scenarios in which the channel input and the side information may be correlated and the related investigations can be found in [9] and [16]. In [9] the problem of optimum transmission rate under the requirement of minimum mutual information is investigated. Moreover both [9] and [16] study Costa’s “writing on dirty paper” problem where the side information is correlated to the input of the channel (our motivation), when only side information known at the transmitter exists. We, in another work, have considered and solved the problem of the capacity of Gaussian channel with two-sided state information in a limited case [17].

Moreover, examining the Gaussian channel with two-sided state information with dependency on the channel noise and channel input, we try to solve the Gaussian version of Cover-Chiang theorem [5] as an open problem.

#### Second motivation

One of the most known and important applications of the channels with side information is information theoretically describing the concept of “cognition” of the transmitter in communication scenarios. Side information in this description, for example, may be the interference which transmitter exactly knows all about it. Two questions arise about this description:

1) It is usually expected the knowledge about or cognition on something to be “quantitative”. For example the cognition that the transmitter can acquire about the interference may be incomplete or partial. So one question is: How can we describe the “quantity” or “amount” of the transmitter cognition? The investigations of the channels with partial CSI try to answer this question, for example [18, 19, 20, 21, 22].

2) It is possible in a communication scenario that the transmitter has knowledge about more than one variable in the channel. For example in a cognitive interference channel the transmitter may have knowledge about the interference originated by the other transmitter and at the same time about the channel noise. Hence, the other question is: How can we describe the “cognition that the transmitter has on some variables”.

In this paper, we propose describing the concept of the “transmitter and or receiver cognition on some variables” by side information available at the transmitter and or receiver probabilistically dependent on those variables. Hence, the side information known at the transmitter correlated to the variable , describes the transmitter cognition on and the amount of this cognition increases as the correlation between the side information and the variable increases. Distinguishing between this meaning of “cognition” from the usual meaning widely used in the literature, it may be proper to use the word “re-cognition (of the transmitter or receiver on something)” for it.

Hence in a Gaussian channel in presence of two-sided state information depicted in Fig. 2, which is the side information known at the transmitter can be interpreted as the transmitter re-cognition on the channel noise, if is correlated with . It is seen that our first motivation, not only can be seen as an effort to solve an important open problem, but also ,if solved, it can exemplify this new description.

### Our Work

To provide the above motivations, we define a Gaussian channel in presence of two-sided state information where the channel input , side information and the channel noise are arbitrarily correlated. Using the extended version of Cover-Chiang unifying theorem [5] to continuous alphabets, we prove a general achievable rate for the channel (lemma 1). Then, we obtain a general upper bound for the channel in the case that the channel input , the side information and the channel noise , form the Markov chain (lemma 2) and we show the coincidence of the lower and upper bounds under this circumstance and therefore establish our capacity theorem for the channel. Using our probabilistic description of “re-cognition” of the transmitter, this circumstance can be explained as follows: if the whole “re-cognition” that the transmitter has got on the channel noise, is gained from the side information -that is a meaningful and practically acceptable circumstance in our communication scenario- then the Markov chain must be satisfied. The obtained channel capacity can be expressed as an increasing function of the mutual information between the side information and the channel noise (i.e. ) and this shows that our new description of “re-cognition” of the transmitter and the receiver can be exemplified by our channel and its capacity.

### Paper Organization

This paper is organized as follows: in section II, we briefly review the Cover-Chiang and the Gel’fand-Pinsker theorems and then introduce a scrutiny of the Costa theorem. In section III, we define our Gaussian channel thoroughly and prove a general lower bound for the defined channel and then obtain a general upper bound for the channel in mentioned case, which coincides with the lower bound and hence is the capacity of the channel. In Section IV, we examine the proved capacity in special cases and interpret them. Specifically, we explain that how this capacity theorem can exemplify the new description of the “re-cognition” of transmitter and or receiver on something. Section VI contains the conclusion. The proofs of lower and upper bounds of the capacity of channel and two lemmas used in our proofs are given in the Appendix.

## 2 A Review of Previous Related Works

To clarify our approach in subsequent sections, in this section we first briefly review the Cover-Chiang capacity theorem for channels with side information available at the transmitter and at the receiver. We then review the Gel’fand-Pinsker (GP) theorem which is a special case of Cover-Chiang theorem when side information is known only at the transmitter. Finally Costa theorem (âââââ“writing on dirty paper” theorem), which is the Gaussian version of the GP theorem, is deeply investigated.

### 2.1 Cover-Chiang Theorem

Fig. 3 shows a channel with side information known at the transmitter and at the receiver where and are the transmitted and the received sequences respectively. The sequences and are the side information known non-causally at the transmitter and at the receiver respectively. The transition probability of the channel, , depends on the input , the side information and . It can be shown that if the channel is memoryless and the sequences is independent and identically distributed (i.i.d.) random variables under , then the capacity of the channel is [5]:

 C=maxp(u,x∣s1)[I(U;S2,Y)−I(U;S1)] (1)

where the maximum is over all distributions:

 p(y,x,u,s1,s2)=p(y∣x,s1,s2)p(u,x∣s1)p(s1,s2) (2)

and is an auxiliary random variable.

It is important to note that the Markov chains:

 S2⟶S1⟶UX (3)
 U→XS1S2→Y (4)

are satisfied for all distributions in (2).

### 2.2 Gel’fand-Pinsker (GP) Theorem

This theorem is special case of Cover-Chiang theorem when . According to GP theorem [3]:

A memoryless channel with transition probability and side information sequence i.i.d. with known non-causally at the transmitter depicted in Fig. 4 has the capacity

 C=maxp(u,x∣s1)[I(U;Y)−I(U;S1)] (5)

for all distributions:

 p(y,x,u,s1)=p(y∣x,s1)p(u,x∣s1)p(s1) (6)

where is an auxiliary random variable.

### 2.3 Costa’s “Writing on Dirty Paper”

Costa [6] examined the Gaussian version of the channel with side information known at the transmitter (Fig. 1). As can be seen, the side information is considered as an additive interference at the receiver. Costa showed that the channel, surprisingly, has the capacity , which is the the same for channels with no interference . Costa derived this capacity by using the results of Gelâfand-Pinsker theorem extended to random variables with continuous alphabets. In this subsection, we first introduce the Costa assumptions and then present a proof for this theorem in such a way that it enables us to introduce our channel and develop our theorem in subsequent sections.

The channel is specified with properties C.1-C.3 below:

c.1 is a sequence of Gaussian i.i.d. random variables with distribution .

c.2 The transmitted sequence is assumed to have the power constraint .

c.3 The output is given by , where is the sequence of white Gaussian noise with zero mean and power i.e. and independent of . The sequence is non-causally known at the transmitter.

It is readily seen that the distributions having the above three properties are in the form of (6). We denote the set of all these ’s with . Although for the Costa channel described above, no restriction has been imposed on the correlation between and , in Costa theorem, the maximum rate corresponds to independent and , and in form of linear combination of and . We define as a subset of with elements having the following properties as well as properties C.1-C.3 mentioned before:

c.4 is a zero mean Gaussian random variable with the maximum average power and independent of .

c.5 The auxiliary random variable takes the linear form .

It is clear that the set (described in C.1-C.5) and their marginal and conditional distributions are subsets of corresponding ’s (described in C.1-C.3).

Achievable rate for Costa channel: From (5), when extended to memoryless channels with discrete time and continuous alphabets, we can obtain an achievable rate for the channel.

The capacity of Costa channel can be written as:

 CCosta=maxp(u,x∣s1)[I(U;Y)−I(U;S1)] (7)

where the maximum is over all ’s in . Since we have:

 CCosta ≥ Missing or unrecognized delimiter for \right (8) = maxp′(u∣x,s1)p′(x∣s1)[I(U;Y)−I(U;S1)] (9) = maxα[I(U;Y)−I(U;S1)] (10)

The expression in the last bracket is calculated for distributions in described in C.1-C.5. Thus, defining , is an achievable rate for the channel. and is calculated as:

 R(α)=12log(P(P+Q1+N)PQ1(1−α)2+N(P+α2Q1)), (11)

and

 maxαR(α)=R(α∗)=12log(1+PN) (12)

where

 α∗=PP+N. (13)

Both and are independent of and then of .

Converse part of Costa theorem: From (5) we can also obtain an upper bound for the channel capacity. We have:

 I(U;Y)−I(U;S1) = −H(U∣Y)+H(U∣S1) (14) ≤ −H(U∣Y,S1)+H(U∣S1) (15) = I(U;Y∣S1) (16) ≤ I(X;Y∣S1) (17)

where inequality (15) follows from the fact that conditioning reduces the entropy and (17) follows from Markov chain which is correct for all distributions in the form of (6), including the distributions in the set . Hence we can write:

 CCosta = maxp(u,x∣s1)[I(U;Y)−I(U;S1)] (18) ≤ maxp(x∣s1)[I(X;Y∣S1)] (19) = maxp(x∣s1)[H(Y∣S1)−H(Y∣X,S1)] (20) = maxp(x∣s1)[H(X+Z∣S1)−H(Z∣X,S1)] (21) ≤ maxp(x∣s1)[H(X+Z)−H(Z)] (22) = 12log(1+PN), (23)

where the inequality (22) is due to the fact that conditioning reduces the entropy. The maximum in (22) is obtained when and are jointly Gaussian with because when the variance is limited, Gaussian distribution maximizes the entropy. From (12) and (23) it is seen that the lower and the upper bounds of the capacity coincide, and therefore the channel capacity is equal to . It is also concluded that for the channel described in C.1-C.3, the optimum condition which leads to the capacity is when and independent of .∎

We can explain the Costa theorem more, as follows: Let consider with independent Gaussian interference with power , with power and with power . If the transmitter knows nothing about this interference, then we take and . If is known at the transmitter, then we take and we have and if and are both known at the transmitter, then and .

## 3 Capacity Theorem For The Gaussian Channel with Two-sided Input-Noise Dependent Side Information

In this section we introduce a Gaussian channel in the presence of two-sided state information correlated to the channel input and noise. Then we present our capacity theorem for this Gaussian channel. The theorem obtains the capacity of channel in the case the channel input , the side information and the channel noise , form the Markov chain . With our new description of the ââ“re-cognition” of the transmitter on the channel noise, the probabilistic dependency between the side information and the channel noise , determines the cognition on the channel noise that the side information carries to the transmitter. Therefore, this Markov chain states that the transmitter acquires all its knowledge on the channel noise just from the side information , which is practically meaningful and acceptable in our scenario. To prove the theorem, we obtain a general achievable rate for the channel capacity (lemma 1) and then a general upper bound for the channel capacity in mentioned case (lemma 2) and show the coincidence of these lower and upper bounds.

### 3.1 Definition of the Channel

As mentioned before, in a Gaussian channel with side information known at the transmitter defined by the set with properties C.1-C.3 (Costa channel), no restriction is imposed upon the correlation between the channel input and the side information . As mentioned in section I, the capacity is only valid for channels in which and has the chance to be independent. Specifically the maximum rate is achieved when and are independent. Let is partitioned into subsets including the distributions for which the correlation coefficient between and is equal to as depicted in Fig. 5. It is obvious that (the set of distributions with properties C.1-C.5) is a subset of and therefore the optimum distribution leading to the capacity of the Costa channel does not belong to other partitions. We can therefore claim that the Costa theorem is not valid for channels defined with random variables in partition with .

Consider the Gaussian channel depicted in Fig. 2. The side information at the transmitter and at the receiver is considered as additive interference at the receiver. From the above discussion, providing our mentioned motivations in section I, our channel has three differences with Costa’s one as follows:

1) In our channel, a specified correlation coefficient between and , exists.

2) To investigate the effect of the side information known at the receiver, we suppose that in our channel there exists a Gaussian side information known non-causally at the receiver which is correlated to both and .

3) We allow the channel input and the side information and to be correlated to the channel noise .

Remark: It is important to note that, as we prove in lemma 3 in the Appendix C, assuming the input random variable correlated to and with specified correlation coefficients, does not impose any restriction on ’s own distribution and the distribution of is still free to choose.

Considering the above differences, our channel is defined by the following properties GC.1-GC.4 (GC for General version of Costa) below:

gc.1 are i.i.d. sequences with zero mean and jointly Gaussian distributions with power and respectively (so we have and ).

gc.2 The output sequence , where is the sequence of white Gaussian noise with zero mean and power . The sequences and are non-causally known at the transmitter and at the receiver respectively.

gc.3 Random variables have the covariance matrix :

 K=E⎧⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪⎩⎡⎢ ⎢ ⎢ ⎢⎣X2XS1XS2XZXS1S21S1S2S1ZXS2S1S2S22S2ZXZS1ZS2ZZ2⎤⎥ ⎥ ⎥ ⎥⎦⎫⎪ ⎪ ⎪ ⎪⎬⎪ ⎪ ⎪ ⎪⎭ (24) =⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣σ2XσXσS1ρXS1σXσS2ρXS2σXσZρXZσXσS1ρXS1σ2S1σS1σS2ρS1S2σS1σZρS1ZσXσS2ρXS2σS1σS2ρS1S2σ2S2σS2σZρS2ZσXσZρXZσS1σZρS1ZσS2σZρS2Zσ2Z⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦ (25)

and therefore, in our channel, the Gaussian noise is not necessarily independent of the additive interference and and the input . Moreover is assumed to have the constraint . Except , all other parameters in have fixed values specified for the channel and must be considered as the definition of the channel.

gc.4 form the Markov Chain . As mentioned earlier, this Markov chain is satisfied by all distributions in the form of (2) in Cover-Chiang capacity theorem and is physically reasonable. Since this Markov chain results in the weaker Markov chain , as proved in lemma 4 in the Appendix D, this property implies that in the covariance matrix in (25) we have:

 ρXS2=ρXS1ρS1S2 (26)

It is readily seen that all distributions having the properties GC.1-GC.4 are in the form of (2). Therefore we can apply the extended version of Cover-Chiang theorem for random variables with continuous alphabets to our channel. We denote the set of all these distributions with (again).

Remark: In the absence of and when is independent of , we can compare the capacity of our channel with the Costa channel and write:

 CCosta=maxS2=0,ρXS1C1. (27)

where denotes the capacity of our channel when is independent of . Note that in this case and when , we have and therefore, looking for the maximum rate in leads to the maximum rate among ’s.

We will show that the optimum distribution resulting in maximum transmission rate, is obtained when are jointly Gaussian and the auxiliary random variable is a linear combination of and . We denote the set of distributions having properties GC.5 and GC.6 below as well as properties GC.1-GC.4, with :

gc.5 The random variables are jointly Gaussian distributed and has zero mean and the maximum power i.e. .

gc.6 As in the Costa theorem:

 U=αS1+X. (28)

where and are now correlated.

It is clear that the set (described in GC.1-GC.6) and their marginal and conditional distributions are subsets of corresponding ’s (described in GC.1-GC.4).

As the final part of this subsection we introduce some definitions required for our capacity theorem:

Suppose is the covariance matrix for random variables having all properties GC.1-GC.6; defining:

 Ai = E{XSi}=σXσSiρXSi,i=1,2 (29) L0 = E{XZ}=σXσZρXZ (30) Li = E{SiZ}=σSiσZρSiZ,i=1,2 (31) B = E{S1S2}=σS1σS2ρS1S2 (32)

we can write , its determinant and its minors as:

 ˆK=⎡⎢ ⎢ ⎢⎣PA1A2L0A1Q1BL1A2BQ2L2L0L1L2N⎤⎥ ⎥ ⎥⎦. (33)

 D ≜ ∣∣ ∣ ∣ ∣∣PA1A2L0A1Q1BL1A2BQ2L2L0L1L2N∣∣ ∣ ∣ ∣∣ (34)
 Unknown environment '% (35)

### 3.2 The Capacity of the Channel

#### Theorem

The Gaussian channel defined by properties GC.1-GC.4, when the channel input , the side information and the channel noise form the Markov chain , has the capacity:

 C=12log⎛⎜ ⎜⎝1+PN(1−ρ2XS1)(1−ρ2S1S2)dNP⎞⎟ ⎟⎠, (36)

where

 dNP = ∣∣ ∣ ∣∣1ρS1S2ρS1ZρS1S21ρS2ZρS1ZρS2Z1∣∣ ∣ ∣∣ (37) = 1+2ρS1S2ρS1ZρS2Z−ρ2S1S2−ρ2S1Z−ρ2S2Z.

Proof of Theorem To prove the theorem, first, we prove a general achievable rate for the channel in lemma 1. Then in lemma 2, we obtain an upper bound for the channel in the case the transmitter acquires all its knowledge on the channel noise from the side information , i.e, we have the Markov chain . Then we show the coincidence of this upper bound with the lower bound of the capacity.

We note that the Markov chain and the Markov chain from GC4, imply the weaker Markov chain . And since and are Gaussian, as we prove in lemma 4 in the Appendix D, the recent Markov chain implies that

 ρXZ=ρXS1ρS1Z. (38)

#### Lemma 1. A General Lower Bound for the Capacity of the Channel

The capacity of the Gaussian channel defined with properties GC.1-GC.4 has the lower bound:

 RG=12log⎛⎜ ⎜⎝1+[σX(1−ρ2XS1)−σZ(ρXS1ρS1Z−ρXZ)]2(1−ρ2S1S2)σ2Z((1−ρ2XS1)dNP−(ρXS1ρS1Z−ρXZ)2(1−ρ2S1S2))⎞⎟ ⎟⎠ (39)

where is defined in (37).

Proof Appendix A contains the proof.

#### Lemma 2. Upper Bound for the Capacity of the Channel

The capacity of the Gaussian channel defined by properties GC.1-GC.4, when the channel input , the side information and the channel noise form the Markov chain , has the upper bound in (36).

Proof Appendix B contains the proof.

For completing the proof of the theorem, it is enough to compute the lower bound of the channel (39), when we have the Markov chain . Applying the equation (38) to equation (39), shows the coincidence of the upper and the lower bounds of the capacity of the channel in this case and the proof is completed. ∎
Remark 1: It can be shown that for variables , and and with properties GC.1 and GC.4:

 I(S1S2;Z)=12log⎛⎝1−ρ2S1S2dNP⎞⎠ (40)

and so the channel capacity (36) can be written as:

 C=12log(1+PN(1−ρ2XS1)exp(2I(S1S2;Z))), (41)

that is an increasing function of .
Remark 2: The transmission rate in (36) can be reached by encoding and decoding schema represented in [5] modified for continuous Gaussian distributions.

## 4 Interpretations and Numerical Results of the Capacity Theorem

In previous section, the capacity of the Gaussian channel with two-sided information correlated to the channel input and noise, has been obtained. The capacity theorem is general except that the Markov chain must be satisfied. In this section we present some corollaries of the capacity theorem. First, we examine the effect of the correlation between the side information and the channel input on the channel capacity. Second, we try to exemplify our new description of the concept of “cognition” of a communicating object (here, transmitter and or receiver) on some features of channel (here, channel noise), by our capacity theorem.

### 4.1 The Effect of the Correlation between the Side Information and the Channel Input on the Capacity:

If we assume that the channel noise is independent of , from (36), the capacity of the channel is:

 C1=12log(1+PN(1−ρ2XS1)) (42)

Corollary 1: From (27), is reduced to the Costa capacity by maximizing it with .

Corollary 2: It is seen that in the case the side information is independent of the channel noise , the capacity of the channel is equal to the capacity when there is no interference . In other words, in this case, the receiver can subtract the known from the received without losing any worthy information.

Corollary 3: The correlation between and decreases the capacity of the channel. It can be explained as follows: by looking at in our dirty paper like coding, mitigating the input-dependent interference effect, also mitigates the input power impact on the channel capacity as this fact is seen in (42) as .

As an extreme and interesting case, when (then ), according to the usual Gaussian coding, the capacity seems to be , which is the capacity when is transmitted and is received. But as our theorem shows, the capacity paradoxically is zero. Because the receiver based on his information ought to decode according to the dirty paper like coding. In DP like coding, with given known sequence , we must find an auxiliary sequence like jointly typical with [6]. Jointly typicality of is equivalent to:

 ∣∣∣(Un0−α∗Sn1,0)TSn1,0∣∣∣≤δ,δ small (43)

where denotes the transpose operation and is computed according to (68). If , there exists no such : since , we have

 ∣∣∣(Un0−α∗Sn1,0)TSn1,0∣∣∣=||Sn1,0||2 (44)

where is the norm of the given known sequence and therefore (43) can not be true. In other words, in this case, encoding error occurs.

Fig. 6 shows the variation of the capacity with respect to when . It is seen that when the correlation between the channel input and the side information known at the transmitter increases, the channel capacity decreases. The maximum capacity is gained when , that is Costa’s capacity. Fig. 7 shows the capacity with respect to for five values of .

### 4.2 Exemplification of the Re-cognition of Transmitter and Receiver on the channel Noise:

#### Re-cognition

“Cognition” is an indispensable concept in communication. The assumption that an intelligent communicating object (transmitter, receiver, relay and so on) has got some side knowledge about some features of the communication channel, is a true and acceptable assumption. This exceeded information owned, for example, by the transmitter is described by ”side information” known at the transmitter. In usual description, the side information is considered as the subject of cognition itself, for example, the interference of another transmitter in a cognitive radio channel [23]. On the other hand, the assumption that the knowledge may be incomplete or imperfect, is necessary in most communication scenarios. Describing this incomplete cognition and corresponding information-theoretic concept, i.e., partial side information are found in the literature; for example in [21] the imperfect known interference is partitioned to one perfect known and one unknown parts; and in [20] partial side information is considered as a disturbed version of the subject variable by noise.

We try here to present an alternative description for the concept of “cognition” in communication by the concept of side information. The essential property of this description is the separation of the subject of knowledge (for example interference, channel noise, fading coefficients and so on) from the side information that carries the knowledge for the intelligent agent (for example transmitter, receiver, relay and so on) and known by it. This point of view is compatible with what happens in reality: we always acquire our knowledge on something indirectly by knowing other things. What make it possible to extract the knowledge about from is dependency between and . Each method of extraction of knowledge about from (estimation and so on), originally relies on this dependency. If is independent from then is non-informative about . And it is expected that increasing the dependency between and , increases the possible knowledge of about .

Avoiding confusion between this new with the usual descriptions of the cognition, we use the word “re-cognition” for it and define it as follows:

A communicating agent (transmitter, receiver, relay and so on) has “re-cognition” on some variable if the side information known by it, has probabilistic dependency on .

#### Exemplification

In the Gaussian channel defined and analyzed in the previous section, the side information is dependent to the channel noise and therefore the transmitter and the receiver have got re-cognition on the channel noise by and respectively. The capacity is proved with Markovity constraint . Considering the new description of re-cognition, this Markov chain simply means that the transmitter acquires all its re-cognition on the channel noise via the side information , which is meaningful and acceptable.

Corollary 4: If , the transmitter have re-cognition on the channel noise obtained by correlated to the noise. If there is no constraint on correlation between and , maximizes the transmission rate, as mentioned in (27). Therefore, from (36) and (41), the capacity in this case is:

 Missing or unrecognized delimiter for \right (45)

It is seen that more correlation between and results in more re-cognition of the transmitter on the channel noise and more capacity. The capacity reaches to infinite when and therefore the transmitter has perfect re-cognition about the channel noise.

Fig. 8 illustrates the capacity of the channel with respect to , the correlation coefficient between the side information and the channel noise when . It is seen that when the correlation increases (that it means that carries more re-cognition on the channel noise to the transmitter), the capacity increases. Fig. 9 shows the capacity of the channel with respect to for five values of . Fig. 10 illustrates the capacity of the channel with respects to mutual information for five values of .

Corollary 5: If , the receiver have re-cognition on the channel noise obtained by correlated to the noise. The capacity in this case is:

 Missing or unrecognized delimiter for \right (46)

It is seen that more correlation between and results in more re-cognition of the receiver on the channel noise and more capacity. Perfect re-cognition takes place with and results in infinite capacity.

Corollary 6: If , If there is no constraint on correlation between and , maximizes the transmission rate, as mentioned in (27). Therefore the capacity of the channel is:

 C=12log⎛⎜ ⎜⎝1+PN1(1−ρ2S1Z−ρ2S2Z)⎞⎟ ⎟⎠. (47)

It is seen that when , the capacity reaches to infinite, even if neither the transmitter nor the receiver has perfect knowledge about the channel noise. In this case the transmitter and the receiver have their shares in re-cognition on the channel noise which leads to totally mitigating the channel noise.

## 5 Conclusion

By fully detailed investigating the Gaussian channel in presence of two-sided input and noise dependent state information, we obtained a general achievable rate for the channel and established the capacity theorem. This capacity theorem, first demonstrate the impact of the transmitter and receiver cognition, with a new introduced interpretation on the capacity and second show the effect of the correlation between the channel input and side information available at the transmitter and at the receiver on the channel capacity. Whereas, as expected, the cognition of the transmitter and receiver increases the capacity, the correlation between the channel input and the side information known at the transmitter decreases it.

## 6 Appendix

### Appendix A.

The proof of Lemma 1 Using the extension of Cover-Chiang capacity theorem given in (1) for random variables with continuous alphabets, the capacity of our channel can be written as:

 C=maxp(u,x∣s1)[I(U;Y,S2)−I(U;S1)] (48)

where the maximum is over all distributions in having properties GC.1-GC.4. Since we have:

 C ≥ maxp∗(u,x∣s1)[I(U;Y,S2)−I(U;S1)] (49) = maxp∗(u∣x,s1)p∗(x∣s1)[I(U;Y,S2)−I(U;S1)] (50) = maxα[I(U;Y,S2)−I(U;S1)] (51)

where the expression in (51) is calculated for the distributions in having properties GC.1-GC.6. Thus, defining , we have:

 C≥maxαR(α)=R(α∗), (52)

therefore is a lower bound for the channel capacity. To compute , we write:

 I(U;Y,S2)=H(U)+H(Y,S2)−H(U,Y,S2) (53)

and

 I(U;S1)=H(U)+H(S1)−H(U,S1). (54)

For we have:

 H(Y,S2)=12log((2πe)2det(cov(Y,S2))) (55)

where

 cov(Y,S2)=[eij]2×2 (56)

and

 e11=P+Q1+Q2+N+2A1+2A2+2B+2L0+2L1+2L2,e12=e21=A2+B+Q2+L2ande22=Q2} (57)

where the , ’s, , ’s, ’s and are defined in previous section. Therefore

 det(cov(Y,S2))=dQ1N+dPN+dPQ1+2dL0L1−2dPL1−2dQ1L0, (58)

where the terms are defined in (35).
For we have:

 Missing or unrecognized delimiter for \left (59)

where

 cov(U,Y,S2)=[eij]3×3 (60)

and

 e11=P+α2Q1+2αA1,e12=e21=P+(α+1)A1+αQ1+αB+αL1+A2+L0,e13=e31=αB+A2,e22=P+Q