Remote estimation over a packet-drop channel with Markovian state

# Remote estimation over a packet-drop channel with Markovian state

Jhelum Chakravorty and Aditya Mahajan Preliminary version of this paper was presented in the 2016 IFAC Workshop on Distributed Estimation and Control in Networked Systems (NecSys), in 2017 International Symposium of Information Theory (ISIT) and in 2017 American Control Conference (ACC).The authors are with the Department of Electrical and Computer Engineering, McGill University, QC, Canada. Email: {jhelum.chakravorty@mail., aditya.mahajan@}mcgill.ca. This research was funded through NSERC Discovery Accelerator Grant 493011.
###### Abstract

We investigate a remote estimation problem in which a transmitter observes a Markov source and chooses the power level to transmit it over a time-varying packet-drop channel. The channel is modeled as a channel with Markovian state where the packet drop probability depends on the channel state and the transmit power. A receiver observes the channel output and the channel state and estimates the source realization. The receiver also feeds back the channel state and an acknowledgment for successful reception to the transmitter. We consider two models for the source—finite state Markov chains and first-order autoregressive processes. For the first model, using ideas from team theory, we establish the structure of optimal transmission and estimation strategies and identify a dynamic program to determine optimal strategies with that structure. For the second model, we assume that the noise process has unimodal and symmetric distribution. Using ideas from majorization theory, we show that the optimal transmission strategy is symmetric and monotonic and the optimal estimation strategy is like Kalman filter. Consequently, when there are a finite number of power levels, the optimal transmission strategy may be described using thresholds that depend on the channel state. Finally, we propose a simulation based approach (Renewal Monte Carlo) to compute the optimal thresholds and optimal performance and elucidate the algorithm with an example.

Remote estimation, real-time communication, renewal theory, symmetric and quasi-convex value and optimal strategies, stochastic approximation

## I Introduction

### I-a Motivation and literature overview

Network control systems are distributed systems where plants, sensors, controllers, and actuators are interconnected via a communication network. Such systems arise in a variety of applications such as IoT (Internet of Things), smart grids, vehicular networks, robotics, etc. One of the fundamental problem in network control system is remote estimation—how should a sensor (which observes a stochastic process) transmit its observations to a receiver (which estimates the state of the stochastic process) when there is a constraint on communication, either in terms of communication cost or communication rate.

In this paper, we consider a remote estimation system as shown in Fig. 1. The system consists of a sensor and an estimator connected over a time-varying wireless fading channel. The sensor observes a Markov process and chooses the power level to transmit its observation to the remote estimator. Communication is noisy and the transmitted packet may get dropped according to a probability that depends on the channel state and the power level. When the packet is dropped the receiver generates an estimate of the state of the source according to previously received packets. The objective is to choose power control and estimation strategies to minimize a weighted sum of transmission power and estimation error.

Several variations of the above model have been considered in the literature. Models with noiseless communication channels have been considered in [1, 2, 3, 4, 5, 6]. Since the channel is noiseless, these papers assume that there are only two power levels: power level 0, which corresponds to not transmitting; and power level 1, which corresponds to transmitting. Under slightly different modeling assumptions, these papers identify the structure of optimal transmission and estimation strategies for first-order autoregressive sources with unimodal noise and for higher order autoregressive sources with orthogonal dynamics and isotropic Gaussian noise. It is shown that the optimal transmission strategy is threshold-based, i.e., the sensor transmits whenever the current error is greater than a threshold. It is also shown that the optimal estimation strategy is like Kalman filter: when the receiver receives a packet, the estimate is the received symbol; when it does not receive the packet, then the estimate is the one-step prediction based on the previous symbol. Quite surprisingly, these results show that there is no advantage in trying to extract information about the source realization from the choice of the power levels. The transmission strategy at the sensor is also called event-triggered communication because the sensor transmits when the event ‘error is greater than a threshold’ is triggered. Models with i.i.d. packet-drop channels are considered in [7, 8, 9], where it is assumed that the transmitter has two power levels: on or off. Remote estimation over additive noise channel is considered in [10].

In this paper we consider a remote estimation problem over packet-drop channel with Markovian state. We assume that the receiver observes the channel state and feeds it back to the transmitter with one step delay. Preliminary results for this model are presented in [11], where attention was restricted to a binary state channel with two input power values (ON or OFF). In the current paper, we consider arbitrary number of channel states and power levels. A related paper is [12], in which a remote estimation over packet-drop channels with Markovian state is considered. It is assumed that the sensor and the receiver know the channel state. It is shown that optimal estimation strategies are like Kalman filter. A detailed comparison with [12] is presented in Section V-A.

Several approaches for computing the optimal transmission strategies have been proposed in the literature. For noiseless channels, these include dynamic programing based approaches [4, 5, 13], approximate dynamic programming based approaches [14], renewal theory based approaches [15]. It is shown in [16] that for event-triggered scheduling, the posterior density follows a generalized closed skew normal (GCSN) distribution. For Markovian channels (when the state is not observed), a change of measure technique to evaluate the performance of an event-triggered scheme is presented in [17]. In this paper, we present a renewal theory based Monte Carlo approach for computing the optimal thresholds. A preliminary version of the results was presented in [9] for a channel with i.i.d. packet drops.

### I-B Contributions

In this paper, we investigate team optimal transmission and estimation strategies for remote estimation over time varying packet-drop channels. We consider two models for the source: finite state Markov source and first order autoregressive source (over either integers or reals). Our main contributions are as follows.

1. For finite sources, we identify sufficient statistics for both the transmitter and the receiver and obtain a dynamic programming decomposition to compute optimal transmission and estimation strategies.

2. For autoregressive sources, we identify qualitative properties of optimal transmission and estimation strategies. In particular, we show that the optimal estimation strategy is like Kalman filter and the optimal transmission strategy only depends on the current source realization and the previous channel state (and does not depend on the receiver’s belief of the source). Furthermore, when the channel state is stochastically monotone (see Assumption III-B for definition), then for any value of the channel state, the optimal transmission strategy is symmetric and quasi-convex in the source realization. Consequently, when the power levels are finite, the optimal transmission strategy is threshold-based, where the thresholds only depend on the previous channel state.

3. We show that the above qualitative properties extend naturally to infinite horizon models.

4. For infinite horizon models, we present a Renewal Theory based Monte-Carlo algorithm to evaluate the performance of any threshold-based strategy. We then combine it with a simultaneous perturbation based stochastic approximation algorithm to provide an algorithm to compute the optimal thresholds. We illustrate our results with a numerical example of a remote estimation problem with a transmitter with two power levels and a Gilbert-Elliott erasure channel.

5. We show that the problem of transmitting over one of available i.i.d. packet-drop channels (at a constant power level) can be considered as special case of our model. We show that there exist thresholds , such that it is optimal to transmit over channel if the error state . See Sec. V-C for details.

### I-C Notation

We use uppercase letters to denote random variables (e.g, , , etc), lowercase letters to denote their realizations (e.g., , , etc.). , and denote respectively the sets of integers, of non-negative integers and of positive integers. Similarly, , and denote respectively the sets of reals, of non-negative reals and of positive reals. For any set , let denote its indicator function, i.e., is if , else . denotes the cardinality of set . denotes the space of probability distributions of . For any vector , denotes the -th component of . For any vector and an interval of , means that equals if ; equals if ; and equals if . Given a Borel subset and a density , we use the notation . For any vector , denotes the derivative with respect to .

### I-D The communication system

We consider a remote estimation system shown in Fig. 1. The different components of the system are explained below.

#### I-D1 Source model

The source is a first-order time-homogeneous Markov chain , . We consider two models for the source.

• Finite state Markov source. In this model, we assume that is a finite set and denote the state transition matrix by , i.e., for any , .

• First-order autoregressive source. In this model, we assume that is either or . The initial state and for , the source evolves as

 Xt+1=aXt+Wt, (1)

where and is an i.i.d. sequence where is distributed according to a symmetric and unimodal distribution111With a slight abuse of notation, when , we consider to the probability density function and when , we consider to be the probability mass function. .

#### I-D2 Channel model

The channel is a packet-drop channel with state. The state process , is a first-order time-homogeneous Markov chain with transition probability matrix . We assume that is finite. This is a standard model for time-varying wireless channels [18, 19].

The input alphabet of the channel is and the output alphabet is where the symbols denotes that no packet was received. At time , the channel output is denoted by .

The packet drop probability depends on the input power , where is the set of allowed power levels. We assume that is a subset of and is either a finite set of the form or an interval of the form , i.e., is uncountable. When , it means that the transmitter does not send a packet. In particular, for any realization of , we have

 \mathdsP(St=st∣X0:t=x0:t,S0:t−1=s0:t−1,U0:t=u0:t)=\mathdsP(St=st∣St−1=st−1)=Qst−1st, (2)

and

 \mathdsP(Yt=yt∣X0:t=x0:t,S0:t=s0:t,U0:t=u0:t)=⎧⎨⎩1−p(st,ut),if yt=xtp(st,ut),if yt=E0,otherwise, (3)

where is the probability that a packet transmitted with power level when the channel is in state is dropped. We assume that the set of the channel states is an ordered set where a larger state means a better channel quality. Then, for all , is (weakly) decreasing in  with and . Furthermore, we assume that for all , is decreasing in .

### I-E The decision makers and the information structure

There are two decision makers in the system—the transmitter and the receiver. At time , the transmitter chooses the transmit power while the receiver chooses an estimate . Let and denote the information sets at the transmitter and the receiver respectively.

The transmitter observes the source realization . In addition, there is one-step delayed feedback from the receiver to the transmitter.222Note that feedback of requires 1 bit to indicate whether the packet was received or not and feedback of requires bits. Thus, the information available at the transmitter is

 I1t={X0:t,U0:t−1,S0:t−1,Y0:t−1}.

The transmitter chooses the transmit power according to

 Ut=ft(I1t)=ft(X0:t,U0:t−1,S0:t−1,Y0:t−1), (4)

where is called the transmission rule at time . The collection for all time is called the transmission strategy.

The receiver observes and, in addition, observes the channel state . Thus, the information available at the receiver is

 I2t={S0:t,Y0:t}.

The receiver chooses the estimate is chosen according to

 ^Xt=gt(I2t)=gt(S0:t,Y0:t), (5)

where is called the estimation rule at time . The collection for all time is called the estimation strategy.

The collection is called a communication strategy.

### I-F The performance measures and problem formulation

At each time , the system incurs two costs: a transmission cost and a distortion or estimation error . Thus, the per-step cost is

 c(Xt,Ut,^Xt)=λ(Ut)+d(Xt,^Xt).

We assume that is (weakly) increasing in with and . For the autoregressive source model, we assume that the distortion is given by , where is even and quasi-convex with .

We are interested in the following optimization problems:

###### Problem (Finite horizon)

In the model described above, identify a communication strategy that minimizes the total cost given by

 JT(f,g)\coloneqq\mathdsE[T−1∑t=0c(Xt,Ut,^Xt)]. (6)

###### Problem (Infinite horizon)

In the model described above, given a discount factor , identify a communication strategy that minimizes the total cost given as follows:

1. For ,

 Jβ(f,g)=(1−β)\mathdsE[∞∑t=0βtc(Xt,Ut,^Xt)]. (7)
2. For ,

 J1(f,g)=limT→∞1T\mathdsE[T−1∑t=0c(Xt,Ut,^Xt)]. (8)

###### Remark 1

In the above model, it has been assumed that whenever the transmitter transmits (i.e., ), it sends the source realization uncoded. This is without loss of generality because the channel input alphabet is the same as the source alphabet and the channel is symmetric. For such models, coding does not improve performance [20].

Problems I-F and I-F are decentralized stochastic control problems. The main conceptual difficulty in solving such problems is that the information available to the decision makers and hence the domain of their strategies grow with time, making the optimization problem combinatorial. One could circumvent this issue by identifying a suitable information state at the decision makers, which do not grow with time. In the following section, we discuss one such method to establish the structural results.

## Ii Main results for finite state Markov sources

### Ii-a Structure of optimal communication strategies

We establish two types of structural results. First, we use person-by-person approach to show that is irrelevant at the transmitter (Lemma II-A); then, we use the common information approach of [21] and establish a belief-state for the common information between the transmitter and the receiver (Theorem II-A).

###### Lemma

For any estimation strategy of the form (5), there is no loss of optimality in restricting attention to transmission strategies of the form

 Ut=ft(Xt,S0:t−1,Y0:t−1). (9)

The proof proceeds by establishing that the process is a controlled Markov process controlled by . See Appendix A for details.

For any strategy of the form (9) and any realization of , define as

 φt(x)=ft(x,s0:t−1,y0:t−1),∀x∈X.

Furthermore, define conditional probability measures  and on as follows: for any ,

 π1t(x) \coloneqq\mathdsPf(Xt=x∣S0:t−1=s0:t−1,Y0:t−1=y0:t−1), π2t(x) \coloneqq\mathdsPf(Xt=x∣S0:t=s0:t,Y0:t=y0:t).

We call the pre-transmission belief and the post-transmission belief. Note that when are random variables, then and are also random variables (taking values in ), which we denote by and .

For the ease of notation, define as follows:

 B(π1,st,φ) \coloneqq\mathdsP(Yt=E|S0:t=s0:t,Y0:t−1=y0:t−1) =∑xt∈Xπ1(xt)p(st,φ(xt)). (10)

Furthermore, define as follows:

 π1|φ,s(x)\coloneqqπ1(x)p(s,φ(x))B(π1,s,φ). (11)

Then, using Baye’s rule one can show the following:

###### Lemma

Given any transmission strategy of the form (9):

1. there exists a function  such that

 π1t+1=F1(π2t)=π2tP. (12)
2. there exists a function such that

 π2t=F2(π1t,st,φt,yt)={δyt,if yt∈Xπ1t|φt,st,if yt=E. (13)

Note that in (12), we are treating as a row-vector and in (13), denotes a Dirac measure centered at . The update equations (12) and (13) are standard non-linear filtering equations. See supplementary material for proof.

###### Theorem

In Problem I-F with finite state Markov source, we have that:

1. Structure of optimal strategies: There is no loss of optimality in restricting attention to transmission and estimation strategies of the form:

 Ut =f∗t(Xt,St−1,Π1t), (14) ^Xt =g∗t(Π2t). (15)
2. Dynamic program: Let denote the space of probability distributions on . Define value functions and as follows: for any ,

 V1T+1(π1t,st)=0, (16)

and for

 V1t(π1t,st) =minφt:X→U{Λ(π1t,φt)+Ht(xt,π1t,st,φt)}, (17) V2t(π2t,st) =min^x∈XD(π2t,^x)+V1t+1(π2tP,st), (18)

where

 Λ(π1,φ) \coloneqq∑x∈Xλ(φ(x))π1(x), Ht(x,π1,s,φ) \coloneqqB(π1,s,φ)V2t(δx,s) +(1−B(π1,s,φ))V2t(π1|φ,s,s), D(π2,^x) \coloneqq∑x∈Xd(x,^x)π2(x).

Let denote the arg min of the right hand side of (17) and . Then, the optimal transmission strategy is given by

 f∗t(⋅,s,π1t)=Ψt(s,π1t)

and the optimal estimation strategy is given by .

The proof follows from the common information approach [21]. See Appendix B for details.

###### Remark 2

The first term in (17) is the expected communication cost, the second term is the expected cost-to-go. The first term in (18) is the expected distortion and the second term is the expected cost-to-go.

###### Remark 3

In (17) we use instead of for the following reasons. Let denote the set of functions from to , which is equal to (since is finite). When is finite, is also finite and thus we can use in (17). When is uncountable, is a product of compact sets and hence is compact and thus we can use in (17).

###### Remark 4

Note that the dynamic program in Theorem II-A is similar to a dynamic program for a partially observable Markov Decision Process (POMDP) with finite state space and finite or uncountable action space (see Remark 3). Thus, the dynamic program can be extended to infinite horizon discounted cost model after verifying standard assumptions. However, doing so does not provide any additional insight, so we do not present infinite horizon results for this model. We will do so for the autoregressive source model later in the paper, where we provide an algorithm to find the optimal time-homogeneous strategy for infinite horizon criteria.

## Iii Main results for autoregressive sources

### Iii-a Structure of optimal trategies for finite horizon model

We start with a change of variables. Define a process as follows: and for ,

 Zt={aZt−1,if Yt=EYt,if Yt∈X.

Next, define processes , , which we call the error processes and as follows:

 Et\coloneqqXt−aZt−1,E+t\coloneqqXt−Zt,^Et\coloneqq^Xt−Zt.

The processes and are related as follows: , , and for ,

 E+t={Et,if Yt=E0,if Yt∈XandEt+1=aE+t+Wt. (19)

The above dynamics may be rewritten as

 Et+1={aEt+Wt,if Yt=EWt,if Yt≠E. (20)

Since , we have that . Thus, with this change of variables, the per-step cost may be written as .

Note that is a deterministic function of . Hence, at time , is measurable at the transmitter and thus is measurable at the transmitter. Moreover, at time , is measurable at the receiver.

###### Lemma

For any transmission and estimation strategies of the form (9) and (5), there exists an equivalent transmission and estimation strategy of the form:

 Ut =~ft(Et,S0:t−1,Y0:t−1), (21) ^Xt =~gt(S0:t,Y0:t)+Zt. (22)

Moreover, for any transmission and estimation strategies of the form (21)–(22), there exist transmission and estimation strategies of the form (9) and (5) that are equivalent.

The proof is given in Appendix C.

An implication of Lemma III-A is that we may assume that the transmitter transmits and the receiver estimates

 ^Et=^Xt−Zt=~gt(S0:t,Y0:t).

For this model, we can further simplify the structures of optimal transmitter and estimator as follows.

###### Theorem

In Problem I-F with first-order autoregressive source, we have that:

1. Structure of optimal estimation strategy: At each time , there is no loss of optimality in choosing the estimates as

 ^Et=0,

or, equivalently, choosing the estimates as: , and for ,

 ^Xt={a^Xt−1,if Yt=EYt,if Yt∈\mathdsR (23)
2. Structure of optimal transmission strategy: There is no loss of optimality in restricting attention to transmission strategies of the form

 Ut=~ft(Et,St−1). (24)
3. Dynamic programming decomposition: Recursively define the following value functions: for any and ,

 JT+1(e,s) =0, (25) and for t∈{T,…,0}, Jt(e,s) =minu∈U¯Ht(e,s,u), (26)

where

 ¯Ht(e,s,u)=λ(u)+∑s′∈SQss′p(s′,u)d(e)+\mathdsE[Jt+1(Et+1,St)|Et=e,St−1=s,Ut=u].

Let denote the arg min of the right hand side of (26). Then the transmission strategy is optimal.

See Appendix D for the proof.

### Iii-B Monotonicity and quasi-convexity of the optimal solution

For autoregressive sources we can establish monotonicity and quasi-convexity of the optimal solution. To that end, let us assume the following.

###### Assumption

The channel transition matrix is stochastic monotone, i.e., for all such that and for any ,

 n∑k=ℓ+1Qik≥n∑k=ℓ+1Qjk.

###### Theorem

For any , we have the following:

1. For all , is even and quasi-convex in .

Furthermore, under Assumption III-B,

1. For every , is decreasing in .

2. For every , the transmission strategy is even and quasi-convex in .

Sufficient conditions under which the value function and the optimal strategy are even and quasi-convex are identified in [22, Theorem 1]. Properties 1 and 3 follow because the above model satisfies these sufficient conditions. Property 2 follows from standard stochastic monotonicity arguments. The details are presented in the supplementary material.

An immediate consequence of Theorem III-B is the following:

###### Corollary

Suppose that Assumption III-B is satisfied and is finite set given by . For any , define333Note that and Theorem III-B implies for any .

 k(i)t(s)\coloneqqinf{e∈\mathdsR≥0:~ft(e,s)=u(i)}.

For ease of notation, define .

Then, the optimal strategy is a threshold based strategy given as follows: for any , and ,

 ~ft(e,s)=u(i). (27)

#### Some remarks

1. It can be shown that under the optimal strategy, is symmetric and unimodal () (see Definition D-B) around and, therefore, is around . Thus, the transmission and estimation strategies in Theorem III-A depend on the pre- and post-transmission beliefs only through their means.

2. Since the distortion function is even and quasi-convex, we can write the threshold conditions

 k(i−1)t(s)≤|e|

in (27) as

 d(k(i−1)t(s))≤d(e)

Thus, if we define distortion levels , then we can say that the optimal strategy is to transmit at power level if .

3. When , the update of the optimal estimate is same as the update equation of Kalman filter. For this reason, we refer to the estimation strategy (23) as a Kalman-filter like estimator.

### Iii-C Generalization to infinite horizon model

Given a communication strategy , let and denote respectively the expected distortion and expected transmitted prower when the system starts in state , i.e., for ,

 D(f,g)β(e,s) \coloneqq(1−β)\mathdsE(f,g)[∞∑t=0βtd(Et)|E0=e,S−1=s], P(f,g)β(e,s) \coloneqq(1−β)\mathdsE(f,g)[∞∑t=0βtλ(Ut)|E0=e,S−1=s],

and for ,

 D(f,g)1(e,s) \coloneqqlimT→∞1T\mathdsE(f,g)[T−1∑t=0d(Et)|E0=e,S0=s], P(f,g)1(e,s) \coloneqqlimT→∞1T\mathdsE(f,g)[T−1∑t=0λ(Ut)|E0=e,S0=s].

Then, the performance of the strategy when the system starts in state is given by

 J(f,g)β(e,s)\coloneqqD(f,g)β(e,s)+P(f,g)β(e,s).

The structure of optimal estimator, as established in Theorem III-A, continues to hold for the infinite horizon setup as well. Thus, we can restrict attention to Kalman-filter like estimator given by (23) and look at the problem of finding the best response transmission strategy. This is a single agent stochastic control problem. If the per-step distortion is unbounded, then we need the following assumption—which implies that there exists a strategy whose performance is bounded—for the infinite horizon problem to be meaningful.

###### Assumption

Let denote the transmission strategy that always transmits at power level and denote the Kalman-filter like strategy given by (23). Then, for given , and for all and , .

Assumption III-C is always satisfied if is bounded. For , and , the condition is sufficient for Assumption III-C to hold (see [23, Theorem 8] and [24, Corollary 12]). Similar sufficient conditions are given in [25, Theorem 1] for vector-valued Markov source processes with a Markovian packet-drop channel.

We now state the main theorem of this section.

###### Theorem

In Problem I-F with first-order autoregressive processes under Assumption III-C, we have that

1. Structure of optimal estimation strategy: The time-homogeneous strategy , where is given by (23), is optimal.

2. Structure of optimal transmission strategy: There is no loss of optimality in restricting attention to time-homogeneous transmission strategies of the form

 Ut=~fβ(Et,St−1).
3. Dynamic programming decomposition: For , let be the smallest bounded solution of the following fixed point equation: for all and ,

 Jβ(e,s)=minu∈U¯Hβ(e,s,u), (28)

where

 ¯Hβ(e,s,u)=(1−β)λ(u)+∑s′∈SQss′p(s′,u)d(e)+β\mathdsE[Jβ(Et+1,St)|Et=e,St−1=s,Ut=u].

Let denote the arg min of the right hand side of (28). Then the transmission is optimal.

4. Results for : Let be any limit point of as . Then, is optimal strategy for Problem I-F with .

The proof is given in Appendix E.

###### Remark 5

We are not asserting that the dynamic program (28) has a unique fixed point. To make such an assertion, we would need to check the sufficient conditions for Banach fixed point theorem. These conditions [26] are harder to check than the sufficient conditions (P1)–(P3) of Proposition E-B that we verify in Appendix E.

###### Corollary

The monotonicity properties of Theorem III-B hold for the infinite horizon value function and transmission strategy as well.

An immediate consequence of Corollary III-C is the following:

###### Corollary

Suppose that Assumption III-B is satisfied and is finite set given by . For any , define444Note that and Corollary III-C implies for any .

 k(i)β(s)\coloneqqinf{e∈\mathdsR≥0:~f(e,s)=u(i)}.

For ease of notation, define .

Then, the optimal strategy is a threshold based strategy given as follows: for any , and ,

 ~fβ(e,s)=u(i). (29)

## Iv Computing optimal thresholds for autoregressive sources with finite actions

Suppose the power levels are finite and given by

 U={0,u(1),…,u(m)},m∈\mathdsZ>0,

with and . From Corollary III-C, we know that the optimal strategy for Problem I-F is a time-homogeneous threshold-based strategy of the form (27). Let denote the thresholds and denote the strategy (29). In this section, we first derive formulas for computing the performance of a general threshold-based strategy of the form (27) and then propose a stochastic approximation based algorithm to identify the optimal thresholds.

It is conceptually simpler to work with a post-decision model where the pre-decision state is and the post-decision state is given by (19). The timeline of the various system variables is shown in Fig. 2. In this model, the per-step cost is given by .555From Theorem III-A, we have that . Thus, .

### Iv-a Performance of an arbitrary threshold-based strategy

For , pick a reference channel state . Given an arbitrary threshold-based strategy , suppose the system starts in state and follows strategy . Then, the process is a Markov process. Let and for let

 τ(n)\coloneqq{t>τ(n−1):(E+t−1,St−1)=(0,s∘)}

denote the stopping times when the Markov process revisits . We say that the Markov process regenerates at times and refer to the interval as the -th regenerative cycle.

Define the following:

• : the expected cost during a regenerative cycle, i.e.,

 L(k)β\coloneqq\mathdsE[τ(1)−1∑t=0βt(λ(Ut)+d(E+t))∣∣E+−1=0,S−1=s∘]. (30)
• : the expected time during a regenerative cycle, i.e.,

 M(k)β\coloneqq\mathdsE[τ(1)−1∑t=0βt∣∣E+−1=0,S−1=s∘]. (31)

Using ideas from renewal theory, we have the following.

###### Theorem

For any , the performance of threshold-based strategy is given by

 C(k)β\coloneqqCβ(f(k)β,g∗)=L(k)βM(k)β. (32)

See Appendix F for the proof.

### Iv-B Necessary condition for optimality

In order to find the optimal threshold, we first observe the following.

###### Lemma

For any , and are differentiable with respect to . Consequently, is also differentiable.

The proof of Lemma IV-B follows from first principles using an argument similar to that in the supplementary material for [15].

Let , and denote the derivatives of , and respectively. Then, a sufficient condition for optimality is the following.

###### Proposition

A necessary condition for thresholds to be optimal is that , where

 N(k∗)β\coloneqqM(k)β∇