RealTime Remote Estimation with Hybrid ARQ in Wireless Networked Control
Abstract
Realtime remote estimation is critical for missioncritical applications including industrial automation, smart grid and tactile Internet. In this paper, we propose a hybrid automatic repeat request (HARQ)based realtime remote estimation framework for linear timeinvariant (LTI) dynamic systems. Considering the estimation quality of such a system, there is a fundamental tradeoff between the reliability and freshness of the sensor’s measurement transmission. We formulate a new problem to optimize the sensor’s online transmission control policy for static and Markov fading channels, which depends on both the current estimation quality of the remote estimator and the current number of retransmissions of the sensor, so as to minimize the longterm remote estimation mean squared error (MSE). This problem is nontrivial. In particular, it is challenging to derive the condition in terms of the communication channel quality and the LTI system parameters, to ensure a bounded longterm estimation MSE. We derive a sufficient condition of the existence of a stationary and deterministic optimal policy that stabilizes the remote estimation system and minimizes the MSE. Also, we prove that the optimal policy has a switching structure, and accordingly derive a lowcomplexity suboptimal policy. Numerical results show that the proposed optimal policy significantly improves the performance of the remote estimation system compared to the conventional nonHARQ policy.
I Introduction
Realtime remote estimation is critical for networked control applications such as industrial automation, smart grid, vehicle platooning, drone swarming, immersive virtual reality (VR) and the tactile Internet [2]. For such realtime applications, high quality remote estimation of the states of dynamic processes over unreliable links is a major challenge. The sensor’s sampling policy, the estimation scheme at a remote receiver, and the communication protocol for stateinformation delivery between the sensor and the receiver should be designed jointly.
To enable the optimal design of wireless remote estimation, the performance metric for the remote estimation system needs to be selected properly. For some applications, the model of the dynamic process under monitoring is unknown and the receiver is not able to estimate the current state of the process based on the previously received states, i.e., a statemonitoringonly scenario [3]. In this scenario, the performance metric is the ageofinformation (AoI), which reflects how old the freshest received sensor measurement is, since the moment that measurement was generated at the sensor [3]. However, in practice, most of the dynamic processes are timecorrelated, and the statechanging rules can be known by the receiver to some extent. Therefore, the receiver can estimate the current state of the process based on the previously received measurements and the model of the dynamic process (see e.g., [4, 5]), especially when the transmission of the packet that carries the current sensor measurement is failed or delayed. In this sense, the estimation mean squared error (MSE) is the appropriate performance metric.
From a communication protocol design perspective, we naturally ask: does a sensor need retransmission or not for missioncritical realtime remote estimation? Retransmission is required by conventional communication systems with nonrealtime backlogged data to be perfectly delivered to the receivers. Also, energyconstrained remote estimation systems and the ones with low sampling rate can also benefit from retransmissions, see e.g., [6] and [7]. It was shown in [8] that retransmissions cannot improve the performance of a missioncritical realtime remote estimation system, which is not mainly constrained by energy nor sampling rate, as it is a waste of transmission opportunity to transmit an outofdate measurement instead of the current one. However, this is true only when a retransmission has the same success probability as a new transmission, e.g., with the standard automatic repeat request (ARQ) protocol. Note that a hybrid ARQ (HARQ) protocol, e.g., with a chase combining (CC) or incremental redundancy (IR) scheme, is able to effectively increase the successful detection probability of a retransmission by combining multiple copies from previously failed transmissions [9]. Therefore, a HARQ protocol has the potential to improve the performance of realtime remote estimation.
In the paper, we introduce HARQ into realtime remote estimation systems and optimally design the sensor’s transmission policy to minimize the estimation MSE. Note that there is a fundamental tradeoff between the reliability and freshness of the sensor’s measurement transmission. When a failed transmission occurs, the sensor can either retransmit the previous old measurement such that the receiver can obtain a more reliable old measurement, or transmit a new but less reliable measurement. The main contributions of the paper are summarized as follows:

We propose a novel HARQbased realtime remote estimation system of timecorrelated random processes, where the sensor makes online decision to send a new measurement or retransmit the previously failed one depending on both the current estimation quality of the receiver and the current number of retransmissions of the sensor.

We formulate the problem to optimize the sensor’s transmission control policy so as to maximize the longterm performance of the receiver in terms of the average MSE for both the static and Markov fading channels. Since it is not clear whether the longterm average MSE can be bounded or not, we derive an elegant sufficient condition in terms of the transmission reliability provided by the HARQ protocol and parameters of the process of interest to ensure that an optimal policy exists and stabilizes the remote estimation system.

We derive a structural property of the optimal policy, i.e., the optimal policy is a switchingtype policy, and give an easytocompute suboptimal policy. Our numerical results show that the proposed HARQbased optimal and suboptimal transmission control policies significantly improve the system performance compared to the conventional nonHARQ policy, under the setting of practical system parameters.
The remainder of this paper is organized as follows. Section II presents the proposed HARQbased remote estimation system. Section III analyzes the HARQbased transmissioncontrol policy, and formulate the optimal transmission control problem. Sections IV and V investigate the optimal transmission control policies of the static and Markov channels, respectively. Section VI numerically presents the optimal, suboptimal and benchmark polices for both static and Markov channels, and their average MSE performance. Finally, Section VII concludes the paper.
Notations: is the probability of the event . is the expectation of the random variable . is the matrix transpose operator. is the sum of the vector ’s elements. is the trace operator. denotes the diagonal matrix with the diagonal elements . and denote the sets of positive and nonnegative integers, respectively. denotes the matrix with identical element .
Ii System Model
We consider a basic system setting that a smart sensor periodically samples, preestimates and sends its local estimation of a dynamic process to a remote receiver through a wireless link with packet dropouts, as illustrated in Fig. 1.
Iia Dynamic Process Modeling
We consider a general discrete LTI model for the dynamic process as (see e.g., [4, 10, 11])
(1)  
where the discrete time steps are determined by the sensor’s sampling period , is the process state vector, is the state transition matrix, is the measurement vector of the smart sensor attached to the process, is the measurement matrix^{1}^{1}1Note that is not necessary to be full rank [12], as illustrated in Fig. 1, i.e., is a twodimensional (2D) signal, while the measurement is onedimensional. After Kalman filtering, we have a 2D ., and are the process and measurement noise vectors, respectively. We assume and are independent and are identically distributed (i.i.d.) zeromean Gaussian processes with corresponding covariance matrices and , respectively. The initial state is zeromean Gaussian with covariance matrix . To avoid trivial problems, we assume that , where is the maximum squared eigenvalue of [13].
IiB State Estimation at the Smart Sensor
Since the sensor’s measurements are noisy, the smart sensor with sufficient computation and storage capacity is required to estimate the state of the process, , using a Kalman filter [10, 11], which gives the minimum estimation MSE, based on the current and previous raw measurements:
(2a)  
(2b)  
(2c)  
(2d)  
(2e) 
where is the identity matrix, is the priori state estimation, is the posteriori state estimation at time , is the Kalman gain, and represent the priori and posterior error covariance at time , respectively. The first two equations present the prediction steps while the last three equations present the updating steps [12]. Note that is the output of the Kalman filter at time , i.e., the prefiltered measurement of , with the estimation error covariance .
As we focus on the effect of communication protocols on the stability and quality of the remote estimation, we assume that the local estimation is stable as follows [10, 11].
Assumption 1.
IiC Wireless Channel
We consider both a static channel and a finitestate timehomogeneous Markov fading channel. For the static channel, the channel power gain does not change with time, i.e., , . For the Markov channel, the channel power gain remains constant during the th time slot and changes slot by slot, where . We assume that the Markov channel has states, i.e, , and . The probability of transition from state to state is , and the matrix of channel state transition probability is given as
(3) 
We assume that the channel state information is available at both the sensor and the receiver, see e.g. [14] and the references therein.
IiD HARQBased Communication
The sensor’s estimation is quantized into bits and then coded into a packet with symbols, where the symbol duration is and is the coding rate. We assume that the packet length is equal to the sampling period, i.e., . In other words, the sensors perform the next sampling once the current measurementcarrying packet has been delivered to the receiver. Thus, there exists a unit packettransmission delay between the sensor and the receiver. For example, the sensor’s measurement at the beginning of time slot , , is filtered and sent to the receiver before time slot . It is assumed that the sensor and the receiver are perfectly synchronized.
The acknowledgment/negativeacknowledgment (ACK/NACK) message is fed back from the receiver to the sensor perfectly without any delay, when the packet detection succeeds/fails. If an ACK is received by the sensor, it will send a new (prefiltered) measurement in the next time slot. If a NACK is received, the sensor can decide whether to retransmit the unsuccessfully transmitted measurement using its ARQ protocol or to send a new measurement. We introduce the binary variable to indicate the successful and failed packet detection in time slot , respectively.
For the standard ARQ protocol, the receiver discards the failed packets, and the sensor simply resends the previously failed packet if a retransmission is required. Thus, the successful packet detection probability at each time is independent of the current number of retransmissions.
For a HARQ protocol, the receiver buffers the incorrectly received packets, and the detection of the retransmitted packet will utilize all the buffered related packets. In the CCHARQ case, the sensor resends the previously failed packet if a retransmission is required, and the receiver optimally combines (i.e., the maximal ratio combining method) all the previously received replicas of the packet of the same message and make a detection. In the IRHARQ case, each retransmitted packet is an incremental redundancy of the same message, and the receiver treats the sequence of all the buffered replicas as a long codeword to detect the transmitted massage.
Given the channel power gains, the probability that the message cannot be detected within transmission attempts started from time slot is given as [15] and [16]
(4)  
(5) 
where is the event of failed detection within transmissions in time slot , is the signaltonoise ratio at the receiver with unit channel power gain, and the approximation (5) is based on the results of the finiteblocklength information theory for AWGN channel, see e.g., [15] and [16].
IiE State Estimation at the Receiver
We assume that the latest sensor’s estimation that is available at the receiver at the beginning of time slot , i.e., , was generated at the beginning of time slot . Therefore, the receiverside AoI at the beginning of time slot can be defined as [3]
(6) 
and .
As the latest available sensor’s estimation was generated step earlier, the receiver needs to estimate the current state based on the dynamic process model (1). The receiver’s MSE optimal estimator at the beginning of time slot is given as [4]
(7) 
and the corresponding estimation error covariance is
(8)  
(9) 
where (9) is obtained by substituting (7) and (1) into (8), , when , and . Note that takes value from a countable infinity set [13], i.e., .
The receiver’s estimation MSE at the beginning of time slot is . Note that the operator is monotonically increasing with respect to (w.r.t.) , i.e., if and (see Lemma 3.1 in [13]).
Remark 1.
From (9), the estimation MSE is a nonlinear function of the AoI, and thus, can also be treated as the estimation quality indicator
IiF Performance Metric
The longterm average MSE of the dynamic process is defined as
(10) 
where is the limit superior operator.
Iii Optimal Transmission Control: Analysis and Problem Formulation
For the standard ARQ, as the chance of the successful detection of a new transmission and that of a retransmission are the same, the optimal policy is to always transmit the current sensor’s estimation, i.e., a nonretransmission policy [8]. For a HARQ protocol, the probability of successful packet detection in time slot , depends on the number of consecutive transmission attempts of the original message and the experienced channel conditions. Since a new transmission is less reliable than a retransmission, there exists an inherent tradeoff between retransmitting previously failed local state estimation with a higher success probability, and sending the current state estimation with a lower success probability. Therefore, when a packet detection error occurs, the sensor needs to optimally make a decision on whether to retransmit it or to start a new transmission.
Iiia TransmissionControl Policy
Let be the sensor’s decision variable at time as illustrated in Fig. 1. If , the sensor sends the new measurement to the receiver in time slot ; otherwise, it retransmits the unsuccessfully transmitted measurement. It is clear that , if the the packet transmitted in time slot was successful.
Let denote the number of consecutive transmission attempts before time slot . As only depends on the sensor’s transmissioncontrol policy, it has the updating rule as
(11) 
where . From the definition of the estimation quality indicator (6), the updating rule of is given as
(12) 
where . As the estimation quality indicator depends on the current number of transmission attempts and also the control policy, plugging (11) into (12), we further have
(13) 
IiiB Packet Error Probability with Online Transmission Control
If the sensor decides to transmit a new measurement in time slot , i.e., , the packet error probability in time slot is obtained directly from (4) as
(14) 
If a retransmission decision has been made, i.e., , the packet error probability based on (4) can be obtained as
(15)  
(16)  
(17) 
In the Markov channel scenario, we assume that the packet error probability in (17) is a function of the current channel power gain , and the state indicator of the previously experienced channel states, which does not rely on the order of the channel states^{3}^{3}3This assumption is in line with the approximation in (5).. To be specific, we define , where is the occurrence number of the channel state with channel power gain , during time slots to . In other words, is a sorted counter of the relevant historical channel states, and . By introducing the channel state index , i.e., , is updated as
(18) 
where .
In the static channel scenario, i.e., a special case of the Markov channel scenario, as the channel power gains are identical to each other, the packet error probability in (19) can be rewritten as a function of the current number of transmission attempts as
(20) 
As the packet error probability of a retransmission is smaller than a new transmission under the same channel condition, we have the following inequalities for the Markov and static channel scenarios, respectively, as
(21) 
and
(22) 
where is the packet error probability of a new transmission with the channel power gain in the Markov channel scenario, and is the packet error probability of a new transmission in the static channel scenario.
For the Markov channel, the largest packet error rate of a retransmission with channel power gain is defined as
(23) 
For the static channel, the largest packet error rate of a retransmission is defined as
(24) 
IiiC Problem Formulation
The sensor’s transmission control policy is defined as the sequence , where is the control action in time slot . In what follows, we optimize the sensor’s policy such that the longterm estimation error is minimized, i.e.,
(25) 
It is possible that the longterm estimation error may never be bounded no matter how we choose the policy, if the channel quality is always bad or the dynamic process (1) changes rapidly. Therefore, it is also important to investigate the condition in terms of the transmission reliability and the dynamic process parameters, under which the remote estimation system can be stabilized, i.e., the longterm estimation MSE can be bounded.
Iv Optimal Policy: Static Channel
In this section, we investigate the optimal transmission control policy in static channel.
Iva MDP Formulation
From (9), (11) and (13), the estimation MSE and states and only depend on the previous action and states, i.e., , and . Therefore, the online decision problem (25) can be formulated as a discrete time Markov decision process (MDP) as follows.
1) The state space is defined as , where the number of transmission attempts, , should be no larger than the estimation quality indicator (i.e., the AoI), , from the definition. The state of the MDP at time is .
2) The action space is defined as . A policy is a mapping from states to actions, i.e., . Recall that the action at time , , indicates a new transmission or a retransmission .
3) The state transition function characterizes the probability that the state transits from state at time to at time with action at time . As the transition is timehomogeneous, we can drop the time index here. Let and denote the current and next state, respectively. Based on the packet error probability (20) and the state updating rules (11) and (13), we have the state transition function as
(26) 
4) The onestage (instantaneous) cost, i.e., the estimation MSE based on (9), is a function of the current state of :
(27) 
which is independent of the action.
Therefore, the problem (25) is equivalent to solve the classical average cost optimization problem of the MDP. Assuming the existence of a stationary and deterministic optimal policy, we can effectively solve the MDP problem using standard methods such as the relative value iteration algorithm [17, Chapter 8].
IvB Optimal Policy: Condition of Existence
Since the cost function grows exponentially with the state , it is possible that the longterm average cost with a HARQbased transmission control policy, , in the state space cannot be bounded, i.e., the remote estimation system is unstable. We give the following sufficient condition of the existence of an optimal policy that has a bounded longterm estimation MSE.
Theorem 1.
Proof.
See Appendix A. ∎
Remark 2.
From Theorem 1, it is clear that the optimal policy exists if we have a good channel condition and a good HARQ scheme, which guarantees high retransmission reliability (i.e., a small and hence a small ), or the dynamic process does not change quickly, which is easy to estimate (i.e., a small ).
IvC Optimal Policy: The Structure
We show that the optimal policy has a switching structure as follows.
Theorem 2.
The optimal policy of problem (25) is a switchingtype policy, i.e., (i) if , then ; (ii) if , then , where is any positive integer.
Proof.
See Appendix B. ∎
In other words, for the optimal policy, the twodimensional state space is divided into two regions by a curve, and the decision actions of the states within each region are the same, as illustrated in Fig. 3.
Remark 3.
Note that the switching structure can help save storage space for online implementation, since the smart sensor only needs to store switchingboundary states rather than the actions on the entire state space. At each time, the sensor simply needs to compare the current state with the boundary states to give the optimal decision.
IvD Optimal Policy: A Special Case
We consider the high SNR scenario, where retransmissions can have ultralow packet error probabilities. Therefore, we assume that a retransmission is always successful in the high SNR scenario, and the optimal policy always exists from Theorem 1.
Due to the successful retransmissions, it can be noted from (26) that the states in with and are transient states. Also, since a successful retransmission must be followed by a new transmission, the states in with and are transient states, and the state has the action of new transmission. Furthermore, due to the switching structure of the optimal policy in Theorem 2, we set a policyswitching threshold for the states with , where the states choose the action of retransmission, while the states with choose the action of new transmission. Then, it is easy to see that the states with and are transient states. Finally, the countably infinite state space is reduced to a finite state space as illustrated in Fig. 3. Only the state has the action , and the other states have the action .
Therefore, is the key design parameter to be optimally designed. The policy optimization problem in the state space is transformed to the onedimensional problem. By calculating the stationary distribution of the states in with a given , the average cost can be obtained, and we have the following result.
Proposition 1.
In the high SNR scenario, the minimum longterm average MSE of the static channel is given as
(29) 
where is the optimal policyswitching threshold.
In Proposition 1, the optimal can be numerically obtained by linear search methods, yielding the minimum estimation MSE.
IvE Suboptimal Policy
The optimal policy of the MDP problem in Sec. IVA does not have a closedform expression for lowcomplexity computation. Besides, since the MDP problem has infinitely many states, it has to be approximated by a truncated MDP problem with finite states for numerical evaluation and solved offline. Therefore, we propose a easytocompute suboptimal policy, which is the myopic policy that makes decision simply to maximize the expected instantaneous cost.
Based on (26) and (27), the expected next step cost given the current state and action can be derived as
(30)  
Then, we have
(31)  
Using the inequality (22), if and only if satisfies
(32) 
Thus, we have the following result.
Proposition 2.
A suboptimal policy of problem (25) is
(33) 
It can be proved that the suboptimal policy in Proposition 2 is also a switchingtype policy. Due to the simplicity of the suboptimal policy, which, unlike the optimal policy, does not need any iteration for policy calculation, it can be applied as an online decision algorithm. In Sec. VI, we will show that the performance of the suboptimal policy is close to the optimal one for practical system parameters. The detailed computationcomplexity analysis will be given in Sec. VI as well.
V Optimal Policy: Markov Channel
In this section, we investigate the sensor’s optimal transmission control policy in the Markov channel.
Va MDP Formulation
We also formulate the problem as a MDP.
1) The state space is defined as .
2) The action space is defined as .
3) Let and denote the current and next state, respectively. The transition probability can be written as
(34) 
where .
4) The onestage cost is given in (27).
VB Optimal Policy: Condition of Existence
Inspired by the static channel scenario, we derive the following condition under which the long term average MSE can be bounded.
Theorem 3.
Proof.
See Appendix C. ∎
Remark 4.
It is interesting to see that when retransmissions have very high reliability, i.e., , the eigenvalues of the matrix approaches to zero, and thus the lefthand side of (35) is much less than one and the remote estimation system can be stabilized.
The stability regions of a twostate Markov channel in terms of and with different are illustrated in Fig. 5, where . We see that a larger results in a smaller stability region.
VC Optimal Policy: The Structure
The optimal policy in the Markov channel also has a switching structure in the state space.
Theorem 4.
(i) if , then ;
(ii) if , then , where is any positive integer.
Proof.
The proof is similar to that of Theorem 2 and is omitted due to the space limitation. ∎
VD Optimal Policy: A Special Case
For the high SNR scenario, we assume that a retransmission is always successful. Thus, the state transition probability (34) does not depends on all the individual element of the historical channelstate vector , and we can simply combine the states in by as the state to reduce the state space.
Similar to the static channel scenario, the state space of the optimal policy can be further reduced as , and the optimal policy for states is , where , and the other states have the action , as illustrated in Fig. 5. Different from the static channel scenario, the optimal policy for the state Markov channel has a set of parameters, i.e., , to be optimally designed.
We can reorder the three dimensional states as a state (column) vector, , and the states and are the th and th elements of , respectively. Using the state transition probability (34) and the transition rule of the special case as illustrated in Fig. 5, the matrix of the state transition probability can be written as
(37) 
where the is the Kronecker product operator, is the th column of defined in (3), and is the
(38) 
(39) 
Based on the stochastic matrix (37), we can calculate the steady state distribution with a given set of policyswitching parameters. By numerically optimizing , we have the following result.
Proposition 3.
In the high SNR scenario, the minimum longterm average MSE of the Markov channel is given as
(40) 
where and is a nullspace vector of with nonnegative values, and here is the by identity matrix.
Vi Numerical Results
Via DelayOptimal Policy: A Benchmark
We also consider a delayoptimal policy based on the HARQ protocol in [18], as the benchmark of the proposed optimal policy. We use the average AoI to measure the delay of the system. Therefore, similar to the MSEoptimization problem (25), the delay optimization problem is formulated as
(41) 
This problem can also be converted to a MDP problem with the same state space, action space and state transition function as presented in Sec. IVA. The onestage cost in terms of delay is
(42) 
ViB Simulation and Policy Comparison
In the remainder of the section, we present numerical results of the optimal policies for static and Markov channels in Sec. IV and Sec. V, respectively, and their performance. Also, we numerically compare these optimal policies with the benchmark policy in Sec. VIA. Unless otherwise stated, we consider CCHARQ, and we set dB, , , , , , , and thus , .
The packet error probabilities for a new transmission and a retransmission of the CC/IRHARQ protocol are based on taking the approximation (5) into (14) and (17), respectively. We use the relative value iteration algorithm in a MDP toolbox in [19] to solve the MDP problems in Sections IV, V and VIA.
ViB1 Static Channel
Policy Comparison. To solve the MDP problem with an infinite state space, the unbounded state space is truncated as to enable the evaluation. We set the channel power gain . Using Theorem 1, we can verify that the CC/IRHARQ based optimal policy exists. Fig. 7 shows different policies within the truncated state space. In Fig. 7(a), we see that in line with Theorem 2, the optimal policy of CCHARQ protocol is a switchingtype one, where the actions of the states that are close to the states with , are equal to zero, i.e., new transmissions are required. Also, we see that the suboptimal policy plotted in Fig. 7(d) is a good approximation of the optimal one within the truncated state space. However, the delayoptimal policy plotted in Fig. 7(c) is very different from the previous ones, where more states have the action of new transmission. Therefore, HARQbased retransmissions are more important to reduce the estimation MSE than the delay. Fig. 7(b) presents the optimal policy of the IRHARQ protocol which is identical with that of CCHARQ in Fig. 7(a). This is because when the channel power gain is high, e.g., , both IR and CCHARQ can provide sufficiently high retransmission reliability and the transmission control policy are the same. However, we can show that the optimal policy for CC and IRHARQ are different when the channel power gain is low and IRHARQ can provide much better retransmission reliability. The policy diagram is not included due to the space limitation.
Performance Comparison. Based on the above numerically obtained polices and the policy with the standard ARQ, i.e., the one without retransmission (see Sec. III), we further evaluate their performances in terms of the longterm average MSE using (10). We run the remote estimation process with time slots and set the initial value of as . Also, we set as the performance baseline, as , .
Fig. 7 plots the average MSE versus the simulation time , using different transmission control policies. Our simulation shows that the conventional nonretransmission policy has an unbounded average MSE, which is not shown in the figure due to the ultra fast growth rate. However, we show that the average MSEs of different HARQbased policies quickly converge to bounded steady state values. Therefore, the proposed HARQbased policy can significantly improve the estimation quality against the conventional policy. Also, we see that the performance of the suboptimal policy is very close to the optimal one. Given the performance baseline, the optimal policy gives a MSE reduction of the delayoptimal policy, which demonstrates the superior performance of the proposed optimal policy.
ViB2 Markov Channel
We consider a two state Markov channel with the channel power gains and . The matrix of channel state transition probability Using Theorem 3, we can verify that the CC/IRHARQ based optimal policy exists. To solve the MDP problem in the Markov channel scenario with an infinite state space, the unbounded state space is truncated as to enable the evaluation, where is the state vector of the historical channel states. Figs. 9 and 9 show the optimal transmission control policy under channel 1 () and 2 (), respectively. We can see the switching structure of the optimal policy. Also, we see that new transmission occurs more often in the good channel than in the bad channel.
We can also calculate the suboptimal policy and delayoptimal policy of the Markov channel, and the computation complexity of these policies together with the ones of the static channel are listed in Table I. We see that the numbers of convergence steps for calculating these policies are less than , and the optimal policy has a larger number of convergence steps than the delayoptimal policy and hence a higher computation complexity.
Performance Comparison. We evaluate the nonretransmission policy, CC/IRHARQ based optimal policy, CCHARQ based suboptimal policy and delayoptimal policy in terms of the longterm average MSE using (10). We run the remote estimation process with time slots and set the initial value of as .
Fig. 10 plots the average MSE versus the simulation time , using different transmission control policies. Our simulation shows that the nonretransmission policy has an unbounded average MSE. We show that the average MSEs of different policies quickly converge to bounded steady state values. Therefore, the proposed HARQbased policy can also significantly improve the estimation quality against the conventional policy in the Markov channel scenario. We see that the performance of the suboptimal policy is very close to the optimal one. Given the performance baseline, the optimal policy reduces MSE by for the delayoptimal policy. Also, unlike the static channel scenario, we see that the IRHARQ based optimal policy significantly reduces the average MSE by compared to the CCHARQ based optimal policy. This is because IRHARQ can provide much better retransmission reliability than CCHARQ especially when the channel quality is bad.
Computation complexity [20]  The number of convergence steps,  
Optimal  Suboptimal  Delayoptimal  Optimal  Suboptimal  Delayoptimal  
Static channel  1  
Markov channel  1 
Vii Conclusions
We have proposed and optimized a HARQbased remote estimation protocol for realtime applications. Our results have shown that the optimal policy can significantly reduce the estimation MSE for some practical settings. As the recent communication standards for realtime wireless control, such as WirelessHART, ISA100 and IEEE 802.15.4e, have not adopted any HARQ techniques, this work also suggests that HARQ can be adopted by the future realtime communication standards to enhance the performance of missioncritical remote estimation/control systems.
Appendix A: Proof of Theorem 1
To prove the existence of a stationary and deterministic optimal policy given condition (28), we need to verify the following conditions [21, Corollary 7.5.10]: (CAV*1) there exists a standard policy such that the recurrent class induced by is equal to the whole state space ; (CAV*2) given , the set is finite.
Condition (CAV*2) can be easily verified based on (27). In what follows, we verify (CAV*1) by first constructing a policy and then proving that it is a standard policy.
The action of the policy is given as
(43) 
It is easy to prove that any state in induced by is a recurrent state. We then prove that is a standard policy by verifying both the expected first passage cost and time from state to are bounded [21]. Due to the space limitation, we only prove that any state with has bounded first passage cost and time. The other states can be proved similarly.
For notational simplicity, the expected first passage cost of the state is denoted as , and the onestage cost (27) is rewritten as
(44) 
Based on (20), (43) and the law of total expectation of the first passage cost of all the possible first passage paths (as illustrated in Fig. 11), the expected first passage cost can be obtained as
(45)  
where ,