Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

# Structure of optimal strategies for remote estimation over Gilbert-Elliott channel with feedback

## Abstract

We investigate remote estimation over a Gilbert-Elliot channel with feedback. We assume that the channel state is observed by the receiver and fed back to the transmitter with one unit delay. In addition, the transmitter gets ack/nack feedback for successful/unsuccessful transmission. Using ideas from team theory, we establish the structure of optimal transmission and estimation strategies and identify a dynamic program to determine optimal strategies with that structure. We then consider first-order autoregressive sources where the noise process has unimodal and symmetric distribution. Using ideas from majorization theory, we show that the optimal transmission strategy has a threshold structure and the optimal estimation strategy is Kalman-like.

## 1 Introduction

### 1.1 Motivation and literature overview

We consider a remote estimation system in which a sensor/transmitter observes a first-order Markov process and causally decides which observations to transmit to a remotely located receiver/estimator. Communication is expensive and takes place over a Gilbert-Elliot channel (which is used to model channels with burst erasures). The channel has two states: off state and on state. When the channel is in the off state, a packet transmitted from the sensor to the receiver is dropped. When the channel is in the on state, a packet transmitted from the sensor to the receiver is received without error. We assume that the channel state is causally observed at the receiver and is fed back to the transmitter with one-unit delay. Whenever there is a successful reception, the receiver sends an acknowledgment to the transmitter. The feedback is assumed to be noiseless.

At the time instances when the receiver does not receive a packet (either because the sensor did not transmit or because the transmitted packet was dropped), the receiver needs to estimate the state of the source process. There is a fundamental trade-off between communication cost and estimation accuracy. Transmitting all the time minimizes the estimation error but incurs a high communication cost; not transmitting at all minimizes the communication cost but incurs a high estimation error.

The motivation of remote estimation comes from networked control systems. The earliest instance of the problem was perhaps considered by Marschak [1] in the context of information gathering in organizations. In recent years, several variations of remote estimation has been considered. These include models that consider idealized channels without packet drops [2, 3, 4, 5, 6, 7, 8, 9] and models that consider channels with i.i.d. packet drops [10, 11].

The salient features of remote estimation are as follows:

1. The decisions are made sequentially.

2. The reconstruction/estimation at the receiver must be done with zero-delay.

3. When a packet does get through, it is received without noise.

Remote estimation problems may be viewed as a special case of real-time communication [12, 13, 14, 15]. As in real-time communication, the key conceptual difficulty is that the data available at the transmitter and the receiver is increasing with time. Thus, the domain of the transmission and the estimation function increases with time.

To circumvent this difficulty one needs to identify sufficient statistics for the data at the transmitter and the data at the receiver. In the real-time communication literature, dynamic team theory (or decentralized stochastic control theory) is used to identify such sufficient statistics as well as to identify a dynamic program to determine the optimal transmission and estimation strategies. Similar ideas are also used in remote-estimation literature. In addition, feature (F3) allows one to further simplify the structure of optimal transmission and estimation strategies. In particular, when the source is a first-order autoregressive process, majorization theory is used to show that the optimal transmission strategies is characterized by a threshold [5, 6, 7, 11, 10]. In particular, it is optimal to transmit when the instantaneous distortion due to not transmitting is greater than a threshold. The optimal thresholds can be computed either using dynamic programming [5, 6] or using renewal relationships [16, 10].

All of the existing literature on remote-estimation considers either channels with no packet drops or channels with i.i.d. packet drops. In this paper, we consider packet drop channels with Markovian memory. We identify sufficient statistics at the transmitter and the receiver. When the source is a first-order autoregressive process, we show that threshold-based strategies are optimal but the threshold depends on the previous state of the channel.

### 1.2 The communication system

#### Source model

The source is a first-order time-homogeneous Markov process , . For ease of exposition, in the first part of the paper we assume that is a finite set. We will later argue that a similar argument works when is a general measurable space. The transition probability matrix of the source is denoted by , i.e., for any ,

 Pxy\coloneqq\mathdsP(Xt+1=y∣Xt=x).

#### Channel model

The channel is a Gilbert-Elliott channel [17, 18]. The channel state is a binary-valued first-order time-homogeneous Markov process. We use the convention that denotes that the channel is in the off state and denotes that the channel is in the on state. The transition probability matrix of the channel state is denoted by , i.e., for ,

 Qrs\coloneqq\mathdsP(St+1=s|St=r).

The input alphabet of the channel is , where denotes the event that there is no transmission. The channel output alphabet is , where the symbols and are explained below. At time , the channel input is denoted by and the channel output is denoted by .

The channel is a channel with state. In particular, for any realization of , we have that

 \mathdsP(Yt=yt∣¯X0:t=¯x0:t,S0:t=s0:t)=\mathdsP(Yt=yt∣¯Xt=¯xt,St=st) (1)

and

 \mathdsP(St=st∣¯X0:t=¯x0:t,S0:t−1=s0:t−1)=\mathdsP(St=st∣St−1=st−1)=Qst−1st (2)

Note that the channel output is a deterministic function of the input and the state . In particular, for any and , the channel output is given as follows:

 y=⎧⎨⎩¯x,if ¯x∈X and s=1E1,if ¯x=E and s=1E0,if s=0

This means that if there is a transmission (i.e., ) and the channel is on (i.e., ), then the receiver observes . However, if there is no transmission (i.e., ) and the channel is on (i.e., ), then the receiver observes , if the channel is off, then the receiver observes .

#### The transmitter

There is no need for channel coding in a remote-estimation setup. Instead, the role of the transmitter is to determine which source realizations need to be transmitted. Let denote the transmitter’s decision. We use the convention that denotes that there is no transmission (i.e., ) and denotes that there is transmission (i.e., ).

Transmission is costly. Each time the transmitter transmits (i.e., ), it incurs a cost of .

At time , the receiver generates an estimate of . The quality of the estimate is determined by a distortion function .

### 1.3 Information structure and problem formulation

It is assumed that the receiver observes the channel state causally. Thus, the information available at the receiver1 is

 I2t={S0:t,Y0:t}.

The estimate is chosen according to

 ^Xt=gt(I2t)=gt(S0:t,Y0:t), (3)

where is called the estimation rule at time . The collection for all time is called the estimation strategy.

It is assumed that there is one-step delayed feedback from the receiver to the transmitter.2 Thus, the information available at the transmitter is

 I1t={X0:t,U0:t−1,S0:t−1,Y0:t−1}.

The transmission decision is chosen according to

 Ut=ft(I1t)=ft(X0:t,U0:t−1,S0:t−1,Y0:t−1), (4)

where is called the transmission rule at time . The collection for all time is called the transmission strategy.

The collection is called a communication strategy. The performance of any communication strategy is given by

 J(f,g)=\mathdsE[T∑t=0λUt+d(Xt,^Xt)] (5)

where the expectation is taken with respect to the joint measure on all system variables induced by the choice of .

We are interested in the following optimization problem.

###### Problem

In the model described above, identify a communication strategy that minimizes the cost defined in (5).

## 2 Main results

### 2.1 Structure of optimal communication strategies

Two-types of structural results are established in the real-time communication literature: (i) establishing that part of the data at the transmitter is irrelevant and can be dropped without any loss of optimality; (ii) establishing that the common information between the transmitter and the receiver can be “compressed” using a belief state. The first structural results were first established by Witsenhausen [12] while the second structural results were first established by Walrand Varaiya [13].

We establish both types of structural results for remote estimation. First, we show that is irrelevant at the transmitter (Lemma 2.1); then, we use the common information approach of [19] and establish a belief-state for the common information between the transmitter and the receiver (Theorem 2.1).

###### Lemma

For any estimation strategy of the form (3), there is no loss of optimality in restricting attention to transmission strategies of the form

 Ut=ft(Xt,S0:t−1,Y0:t−1). (6)

The proof idea is similar to [14]. We show that is a controlled Markov process controlled by . See Section 3 for proof.

Now, following [19], for any transmission strategy of the form (6) and any realization of , define as

 φt(x)=ft(x,s0:t−1,y0:t−1),∀x∈X.

Furthermore, define conditional probability measures  and on as follows: for any ,

 π1t(x) \coloneqq\mathdsPf(Xt=x∣S0:t−1=s0:t−1,Y0:t−1=y0:t−1), π2t(x) \coloneqq\mathdsPf(Xt=x∣S0:t=s0:t,Y0:t=y0:t).

We call the pre-transmission belief and the post-transmission belief. Note that when are random variables, then and are also random variables which we denote by and .

For the ease of notation, for any and , define the following:

• .

• For any probability distribution on and any subset of , denotes .

• For any probability distribution on , means that .

###### Lemma

Given any transmission strategy of the form (6):

1. there exists a function  such that

 π1t+1=F1(π2t)=π2tP. (7)
2. there exists a function such that

 π2t=F2(π1t,φt,yt). (8)

In particular,

 π2t=⎧⎪⎨⎪⎩δytif yt∈Xπ1t|φt,if yt=E1π1t,if yt=E0. (9)

Note that in (7), we are treating as a row-vector and in (9), denotes a Dirac measure centered at . The update equations (7) and (8) are standard non-linear filtering equations. See Section 3 for proof.

###### Theorem

In Problem 1.3, we have that:

1. Structure of optimal strategies: There is no loss of optimality in restricting attention to optimal transmission and estimation strategies of the form:

 Ut =f∗t(Xt,St−1,Π1t), (10) ^Xt =g∗t(Π2t). (11)
2. Dynamic program: Let denote the space of probability distributions on . Define value functions and as follows.

 V1T+1(s,π1)=0, (12) and for t∈{T,…,0} Unknown environment 'lgathered' (13) V2t(s,π2)=min^x∈X∑x∈Xd(x,^x)π2(x)+V1t+1(s,π2P), (14)

where,

 W0t(π1,φ) =Qs0V2t(0,π1)+Qs1V2t(1,π1|φ), W1t(π1,φ,x) =Qs0V2t(0,π1)+Qs1V2t(1,δx).

Let denote the arg min of the right hand side of (13). Then, the optimal transmission strategy of the form (10) is given by

 f∗t(⋅,s,π1)=Ψt(s,π1).

Furthermore, the optimal estimation strategy of the form (11) is given by

 g∗t(π2)=argmin^x∈X∑x∈Xd(x,^x)π2(x). (15)

The proof idea is as follows. Once we restrict attention to transmission strategies of the form (6), the information structure is partial history sharing [19]. Thus, one can use the common information approach of [19] and obtain the structure of optimal strategies. See Section 3 for proof.

###### Remark 1

The first term in (13) is the expected communication cost, the second term is the expected cost-to-go when the transmitter does not transmit, and the third term is the expected cost-to-go when the transmitter transmits. The first term in (14) is the expected distortion and the second term is the expected cost-to-go.

###### Remark 2

Although the above model and result are stated for sources with finite alphabets, they extend naturally to general state spaces (including Euclidean spaces) under standard technical assumptions. See [20] for details.

### 2.2 Optimality of threshold-based strategies for autoregressive source

In this section, we consider a first-order autoregressive source , , where the initial state and for , we have that

 Xt+1=aXt+Wt, (16)

where and is distributed according to a symmetric and unimodal distribution with probability density function . Furthermore, the per-step distortion is given by , where is a even function that is increasing on . The rest of the model is the same as before.

For the above model, we can further simplify the result of Theorem 2.1. See Section 4 for the proof.

###### Theorem

For a first-order autoregressive source with symmetric and unimodal disturbance,

1. Structure of optimal estimation strategy: The optimal estimation strategy is given as follows: , and for ,

 ^Xt={a^Xt−1,if Yt∈{E0,E1}Yt,if Yt∈\mathdsR (17)
2. Structure of optimal transmission strategy: There exist threshold functions such that the following transmission strategy is optimal:

 ft(Xt,St−1,Π1t)={1,if |Xt−a^Xt−1|≥kt(St−1)0,otherwise. (18)

###### Remark 3

As long as the receiver can distinguish between the events (i.e., ) and (i.e., and ), the structure of the optimal estimator does not depend on the channel state information at the receiver.

###### Remark 4

It can be shown that under the optimal strategy, is symmetric and unimodal around and, therefore, is symmetric and unimodal around . Thus, the transmission and estimation strategies in Theorem 2.2 depend on the pre- and post-transmission beliefs only through their means.

###### Remark 5

Recall that the distortion function is even and increasing. Therefore, the condition can be written as . Thus, the optimal strategy is to transmit if the per-step distortion due to not transmitting is greater than a threshold.

## 3 Proof of the structural results

### 3.1 Proof of Lemma 2.1

Arbitrarily fix the estimation strategy and consider the best response strategy at the transmitter. We will show that is an information state at the transmitter.

Given any realization of the system variables , define and . Now, for any , we use the shorthand to denote . Then,

 \mathdsP(˘ı1t+1|i1t,ut)=\mathdsP(˘xt+1,˘st,˘yt,˘ı1t|x0:t,s0:t−1,y0:t−1,u0:t) \lx@stackrel(a)=\mathdsP(˘xt+1,˘st,˘yt,˘ı1t|x0:t,¯x0:t,s0:t−1,y0:t−1,u0:t) \lx@stackrel(b)=\mathdsP(˘xt+1|xt)\mathdsP(˘yt|¯xt,˘st)\mathdsP(˘st|st−1)\mathds1{˘ı1t=~ı1t} =\mathdsP(˘ı1t+1|~ı1t,ut) (19)

where we have added in the conditioning in because is a deterministic function of and follows from the source and the channel models. By marginalizing (19), we get that for any , we have

 \mathdsP(˘ı2t|i1t,ut)=\mathdsP(˘ı2t|~ı1t,ut) (20)

Now, let denote the per-step cost. Recall that . Thus, by (20), we get that

 \mathdsE[c(Xt,Ut,^Xt)|i1t,ut]=\mathdsE[c(Xt,Ut,^Xt)|~ı1t,ut]. (21)

Eq. (19) shows that is a controlled Markov process controlled by . Eq. (21) shows that is sufficient for performance evaluation. Hence, by Markov decision theory [21], there is no loss of optimality in restricting attention to transmission strategies of the form (6).

### 3.2 Proof of Lemma 2.1

Consider

 π1t+1(xt+1) =\mathdsP(xt+1|s0:t,y0:t) =∑xt∈X\mathdsP(xt+1|xt)\mathdsP(xt|s0:t,y0:t) =∑xt∈XPxtxt+1π2t(xt)=π2tP (22)

which is the expression for .

For , we consider the three cases separately. For , we have

 π2t(x)=\mathdsP(Xt=x|s0:t,y0:t)=\mathds1{x=yt}. (23)

For , we have

 π2t(x) =\mathdsP(Xt=x|s0:t,y0:t) =\mathdsP(Xt=x,yt,st|s0:t−1,y0:t−1)\mathdsP(yt,st|s0:t−1,y0:t−1) (24)

Now, when , we have that

 \mathdsP(xt,yt,st|s0:t−1,y0:t−1)=\mathdsP(yt|xt,φt(xt),st)Qst−1stπ1t(xt) \lx@stackrel(a)={Qst−11π1t(xt),if φt(xt)=0 and st=10,otherwise (25)

where  is obtained from the channel model. Substituting (25) in (24) and canceling from the numerator and the denominator, we get (recall that this is for the case when ),

 π2t(x) =\mathds1{φt(x)=0}π1t(x)π1t(B0(φ)). (26)

Similarly, when , we have that

 \mathdsP(xt,yt,st|s0:t−1,y0:t−1)=\mathdsP(yt|xt,φt(xt),st)Qst−1stπ1t(xt) \lx@stackrel(b)={Qst−10π1t(xt),if st=00,otherwise (27)

where  is obtained from the channel model. Substituting (27) in (24) and canceling from the numerator and the denominator, we get (recall that this is for the case when ),

 π2t(x)=π1t(x). (28)

By combining (23), (26) and (28), we get (9).

### 3.3 Proof of Theorem 2.1

Once we restrict attention to transmission strategies of the form (6), the information structure is partial history sharing [19]. Thus, one can use the common information approach of [19] and obtain the structure of optimal strategies.

Following [19], we split the information available at each agent into a “common information” and “local information”. Common information is the information available to all decision makers in the future; the remaining data at the decision maker is the local information. Thus, at the transmitter, the common information is and the local information is . Similarly, at the receiver, the common information is and the local information is . When the transmitter makes a decision, the state (sufficient for input output mapping) of the system is ; when the receiver makes a decision, the state of the system is . By [19, Proposition 1], we get that the sufficient statistic for the common information at the transmitter is

 Θ1t(x,s)=\mathdsP(Xt=x,St−1=s|S0:t−1,Y0:t−1),

and the sufficient statistic for the common information at the receiver is

 Θ2t(x,s)=\mathdsP(Xt=x,St=s|S0:t,Y0:t).

Note that is equivalent to and is equivalent to . Therefore, by [19, Theorem 2], there is no loss of optimality in restricting attention to transmission strategies of the form (10) and estimation strategies of the form

 ^Xt=gt(St,Π2t). (29)

Furthermore, the dynamic program of 2.1 follows from [19, Theorem 3].

Note that the right hand side of (14) implies that does not depend on . Thus, instead of (29), we can restrict attention to estimation strategy of the form (11). Furthermore, the optimal estimation strategy is given by (15).

## 4 Proof of optimality of threshold-based strategies for autoregressive source

### 4.1 A change of variables

Define a process as follows: and for ,

 Zt={aZt−1,if Yt∈{E0,E1}Yt,if Yt∈X

Note that is a function of . Next, define processes , , and as follows:

 Et\coloneqqXt−aZt−1,E+t\coloneqqXt−Zt,^Et\coloneqq^Xt−Zt

The processes and are related as follows: , , and for

 E+t ={Et,if Yt∈{E0,E1}0,if Yt∈X and Et+1 =aE+t+Wt.

Since , we have that .

It turns out that it is easier to work with the processes , , and rather than and .

Next, redefine the pre- and post-transmission beliefs in terms of the error process. With a slight abuse of notation, we still denote the (probability density) of the pre- and post-transmission beliefs as and . In particular, is the conditional pdf of given and is the conditional pdf of given .

Let denote the event whether the transmission was successful or not. In particular,

 Ht=⎧⎨⎩E0,if Yt=E0E1,if Yt=E11,if Yt∈\mathdsR.

We use to denote the realization of . Note that is a deterministic function of and .

The time-evolutions of and is similar to Lemma 2.1. In particular, we have

###### Lemma

Given any transmission strategy of the form (4):

1. there exists a function such that

 π1t+1=F1(π2t). (30)

In particular,

 π1t+1={~π2t⋆μ,if yt∈{E0,E1}μ,if yt∈\mathdsR, (31)

where given by is the conditional probability density of , is the probability density function of and is the convolution operation.

2. there exists a function such that

 π2t=F2(π1t,φt,ht). (32)

In particular,

 π2t=⎧⎪⎨⎪⎩δ0,if ht=1π1t|φt,if ht=E1π1t,if ht=E0. (33)

The key difference between Lemmas 2.1 and 4.1 (and the reason that we work with the error process rather than ) is that the function in (32) depends on rather than . Consequently, the dynamic program of Theorem 2.1 is now given by

 V1T+1(s,π1)=0, (34) and for t∈{T,…,0} Unknown environment 'lgathered' (35) V2t(s,π2)=D(π2)+V1t+1(s,F1(π2)), (36)

where,

 W0t(π1,φ) =Qs0V2t(0,π1)+Qs1V2t(1,π1|φ), W1t(π1,φ) =Qs0V2t(0,π1)+Qs1V2t(1,δ0), D(π2) =min^e∈\mathdsR∫\mathdsRd(e−^e)π2(e)de.

Again, note that due to the change of variables, the expression for does not depend on the transmitted symbol. Consequently, the expression for is simpler than that in Theorem 2.1.

### 4.2 Symmetric unimodal distributions and their properties

A probability density function on reals is said to be symmetric and unimodal () around if for any , and is non-decreasing in the interval and non-increasing in the interval .

Given , a prescription is called threshold based around if there exists such that

 φ(e)={1,if |e−c|≥k0,if |e−c|

Let denote the family of all threshold-based prescription around .

Now, we state some properties of symmetric and unimodal distributions..

###### Property

If is , then

 c∈argmin^e∈\mathdsR∫\mathdsRd(e−^e)π(e)de.

For , the above property is a special case of [5, Lemma 12]. The result for general follows from a change of variables.

###### Property

If is and , then for any , is .

###### Proof:

We prove the result for each separately. Recall the update of given by (33). For , and hence is . For , ; if , then