Real-Time Remote Estimation with Hybrid ARQ in Wireless Networked Control
Real-time remote estimation is critical for mission-critical applications including industrial automation, smart grid and tactile Internet. In this paper, we propose a hybrid automatic repeat request (HARQ)-based real-time remote estimation framework for linear time-invariant (LTI) dynamic systems. Considering the estimation quality of such a system, there is a fundamental tradeoff between the reliability and freshness of the sensor’s measurement transmission. We formulate a new problem to optimize the sensor’s online transmission control policy for static and Markov fading channels, which depends on both the current estimation quality of the remote estimator and the current number of retransmissions of the sensor, so as to minimize the long-term remote estimation mean squared error (MSE). This problem is non-trivial. In particular, it is challenging to derive the condition in terms of the communication channel quality and the LTI system parameters, to ensure a bounded long-term estimation MSE. We derive a sufficient condition of the existence of a stationary and deterministic optimal policy that stabilizes the remote estimation system and minimizes the MSE. Also, we prove that the optimal policy has a switching structure, and accordingly derive a low-complexity suboptimal policy. Numerical results show that the proposed optimal policy significantly improves the performance of the remote estimation system compared to the conventional non-HARQ policy.
Real-time remote estimation is critical for networked control applications such as industrial automation, smart grid, vehicle platooning, drone swarming, immersive virtual reality (VR) and the tactile Internet . For such real-time applications, high quality remote estimation of the states of dynamic processes over unreliable links is a major challenge. The sensor’s sampling policy, the estimation scheme at a remote receiver, and the communication protocol for state-information delivery between the sensor and the receiver should be designed jointly.
To enable the optimal design of wireless remote estimation, the performance metric for the remote estimation system needs to be selected properly. For some applications, the model of the dynamic process under monitoring is unknown and the receiver is not able to estimate the current state of the process based on the previously received states, i.e., a state-monitoring-only scenario . In this scenario, the performance metric is the age-of-information (AoI), which reflects how old the freshest received sensor measurement is, since the moment that measurement was generated at the sensor . However, in practice, most of the dynamic processes are time-correlated, and the state-changing rules can be known by the receiver to some extent. Therefore, the receiver can estimate the current state of the process based on the previously received measurements and the model of the dynamic process (see e.g., [4, 5]), especially when the transmission of the packet that carries the current sensor measurement is failed or delayed. In this sense, the estimation mean squared error (MSE) is the appropriate performance metric.
From a communication protocol design perspective, we naturally ask: does a sensor need retransmission or not for mission-critical real-time remote estimation? Retransmission is required by conventional communication systems with non-real-time backlogged data to be perfectly delivered to the receivers. Also, energy-constrained remote estimation systems and the ones with low sampling rate can also benefit from retransmissions, see e.g.,  and . It was shown in  that retransmissions cannot improve the performance of a mission-critical real-time remote estimation system, which is not mainly constrained by energy nor sampling rate, as it is a waste of transmission opportunity to transmit an out-of-date measurement instead of the current one. However, this is true only when a retransmission has the same success probability as a new transmission, e.g., with the standard automatic repeat request (ARQ) protocol. Note that a hybrid ARQ (HARQ) protocol, e.g., with a chase combining (CC) or incremental redundancy (IR) scheme, is able to effectively increase the successful detection probability of a retransmission by combining multiple copies from previously failed transmissions . Therefore, a HARQ protocol has the potential to improve the performance of real-time remote estimation.
In the paper, we introduce HARQ into real-time remote estimation systems and optimally design the sensor’s transmission policy to minimize the estimation MSE. Note that there is a fundamental tradeoff between the reliability and freshness of the sensor’s measurement transmission. When a failed transmission occurs, the sensor can either retransmit the previous old measurement such that the receiver can obtain a more reliable old measurement, or transmit a new but less reliable measurement. The main contributions of the paper are summarized as follows:
We propose a novel HARQ-based real-time remote estimation system of time-correlated random processes, where the sensor makes online decision to send a new measurement or retransmit the previously failed one depending on both the current estimation quality of the receiver and the current number of retransmissions of the sensor.
We formulate the problem to optimize the sensor’s transmission control policy so as to maximize the long-term performance of the receiver in terms of the average MSE for both the static and Markov fading channels. Since it is not clear whether the long-term average MSE can be bounded or not, we derive an elegant sufficient condition in terms of the transmission reliability provided by the HARQ protocol and parameters of the process of interest to ensure that an optimal policy exists and stabilizes the remote estimation system.
We derive a structural property of the optimal policy, i.e., the optimal policy is a switching-type policy, and give an easy-to-compute suboptimal policy. Our numerical results show that the proposed HARQ-based optimal and suboptimal transmission control policies significantly improve the system performance compared to the conventional non-HARQ policy, under the setting of practical system parameters.
The remainder of this paper is organized as follows. Section II presents the proposed HARQ-based remote estimation system. Section III analyzes the HARQ-based transmission-control policy, and formulate the optimal transmission control problem. Sections IV and V investigate the optimal transmission control policies of the static and Markov channels, respectively. Section VI numerically presents the optimal, suboptimal and benchmark polices for both static and Markov channels, and their average MSE performance. Finally, Section VII concludes the paper.
Notations: is the probability of the event . is the expectation of the random variable . is the matrix transpose operator. is the sum of the vector ’s elements. is the trace operator. denotes the diagonal matrix with the diagonal elements . and denote the sets of positive and non-negative integers, respectively. denotes the matrix with identical element .
Ii System Model
We consider a basic system setting that a smart sensor periodically samples, pre-estimates and sends its local estimation of a dynamic process to a remote receiver through a wireless link with packet dropouts, as illustrated in Fig. 1.
Ii-a Dynamic Process Modeling
where the discrete time steps are determined by the sensor’s sampling period , is the process state vector, is the state transition matrix, is the measurement vector of the smart sensor attached to the process, is the measurement matrix111Note that is not necessary to be full rank , as illustrated in Fig. 1, i.e., is a two-dimensional (2D) signal, while the measurement is one-dimensional. After Kalman filtering, we have a 2D ., and are the process and measurement noise vectors, respectively. We assume and are independent and are identically distributed (i.i.d.) zero-mean Gaussian processes with corresponding covariance matrices and , respectively. The initial state is zero-mean Gaussian with covariance matrix . To avoid trivial problems, we assume that , where is the maximum squared eigenvalue of .
Ii-B State Estimation at the Smart Sensor
Since the sensor’s measurements are noisy, the smart sensor with sufficient computation and storage capacity is required to estimate the state of the process, , using a Kalman filter [10, 11], which gives the minimum estimation MSE, based on the current and previous raw measurements:
where is the identity matrix, is the priori state estimation, is the posteriori state estimation at time , is the Kalman gain, and represent the priori and posterior error covariance at time , respectively. The first two equations present the prediction steps while the last three equations present the updating steps . Note that is the output of the Kalman filter at time , i.e., the pre-filtered measurement of , with the estimation error covariance .
Ii-C Wireless Channel
We consider both a static channel and a finite-state time-homogeneous Markov fading channel. For the static channel, the channel power gain does not change with time, i.e., , . For the Markov channel, the channel power gain remains constant during the th time slot and changes slot by slot, where . We assume that the Markov channel has states, i.e, , and . The probability of transition from state to state is , and the matrix of channel state transition probability is given as
We assume that the channel state information is available at both the sensor and the receiver, see e.g.  and the references therein.
Ii-D HARQ-Based Communication
The sensor’s estimation is quantized into bits and then coded into a packet with symbols, where the symbol duration is and is the coding rate. We assume that the packet length is equal to the sampling period, i.e., . In other words, the sensors perform the next sampling once the current measurement-carrying packet has been delivered to the receiver. Thus, there exists a unit packet-transmission delay between the sensor and the receiver. For example, the sensor’s measurement at the beginning of time slot , , is filtered and sent to the receiver before time slot . It is assumed that the sensor and the receiver are perfectly synchronized.
The acknowledgment/negative-acknowledgment (ACK/NACK) message is fed back from the receiver to the sensor perfectly without any delay, when the packet detection succeeds/fails. If an ACK is received by the sensor, it will send a new (pre-filtered) measurement in the next time slot. If a NACK is received, the sensor can decide whether to retransmit the unsuccessfully transmitted measurement using its ARQ protocol or to send a new measurement. We introduce the binary variable to indicate the successful and failed packet detection in time slot , respectively.
For the standard ARQ protocol, the receiver discards the failed packets, and the sensor simply resends the previously failed packet if a retransmission is required. Thus, the successful packet detection probability at each time is independent of the current number of retransmissions.
For a HARQ protocol, the receiver buffers the incorrectly received packets, and the detection of the retransmitted packet will utilize all the buffered related packets. In the CC-HARQ case, the sensor resends the previously failed packet if a retransmission is required, and the receiver optimally combines (i.e., the maximal ratio combining method) all the previously received replicas of the packet of the same message and make a detection. In the IR-HARQ case, each retransmitted packet is an incremental redundancy of the same message, and the receiver treats the sequence of all the buffered replicas as a long codeword to detect the transmitted massage.
where is the event of failed detection within transmissions in time slot , is the signal-to-noise ratio at the receiver with unit channel power gain, and the approximation (5) is based on the results of the finite-blocklength information theory for AWGN channel, see e.g.,  and .
Ii-E State Estimation at the Receiver
We assume that the latest sensor’s estimation that is available at the receiver at the beginning of time slot , i.e., , was generated at the beginning of time slot . Therefore, the receiver-side AoI at the beginning of time slot can be defined as 
As the latest available sensor’s estimation was generated -step earlier, the receiver needs to estimate the current state based on the dynamic process model (1). The receiver’s MSE optimal estimator at the beginning of time slot is given as 
and the corresponding estimation error covariance is
The receiver’s estimation MSE at the beginning of time slot is . Note that the operator is monotonically increasing with respect to (w.r.t.) , i.e., if and (see Lemma 3.1 in ).
From (9), the estimation MSE is a non-linear function of the AoI, and thus, can also be treated as the estimation quality indicator
Ii-F Performance Metric
The long-term average MSE of the dynamic process is defined as
where is the limit superior operator.
Iii Optimal Transmission Control: Analysis and Problem Formulation
For the standard ARQ, as the chance of the successful detection of a new transmission and that of a retransmission are the same, the optimal policy is to always transmit the current sensor’s estimation, i.e., a non-retransmission policy . For a HARQ protocol, the probability of successful packet detection in time slot , depends on the number of consecutive transmission attempts of the original message and the experienced channel conditions. Since a new transmission is less reliable than a retransmission, there exists an inherent trade-off between retransmitting previously failed local state estimation with a higher success probability, and sending the current state estimation with a lower success probability. Therefore, when a packet detection error occurs, the sensor needs to optimally make a decision on whether to retransmit it or to start a new transmission.
Iii-a Transmission-Control Policy
Let be the sensor’s decision variable at time as illustrated in Fig. 1. If , the sensor sends the new measurement to the receiver in time slot ; otherwise, it retransmits the unsuccessfully transmitted measurement. It is clear that , if the the packet transmitted in time slot was successful.
Let denote the number of consecutive transmission attempts before time slot . As only depends on the sensor’s transmission-control policy, it has the updating rule as
where . From the definition of the estimation quality indicator (6), the updating rule of is given as
Iii-B Packet Error Probability with Online Transmission Control
If the sensor decides to transmit a new measurement in time slot , i.e., , the packet error probability in time slot is obtained directly from (4) as
If a retransmission decision has been made, i.e., , the packet error probability based on (4) can be obtained as
In the Markov channel scenario, we assume that the packet error probability in (17) is a function of the current channel power gain , and the state indicator of the previously experienced channel states, which does not rely on the order of the channel states333This assumption is in line with the approximation in (5).. To be specific, we define , where is the occurrence number of the channel state with channel power gain , during time slots to . In other words, is a sorted counter of the relevant historical channel states, and . By introducing the channel state index , i.e., , is updated as
In the static channel scenario, i.e., a special case of the Markov channel scenario, as the channel power gains are identical to each other, the packet error probability in (19) can be rewritten as a function of the current number of transmission attempts as
As the packet error probability of a retransmission is smaller than a new transmission under the same channel condition, we have the following inequalities for the Markov and static channel scenarios, respectively, as
where is the packet error probability of a new transmission with the channel power gain in the Markov channel scenario, and is the packet error probability of a new transmission in the static channel scenario.
For the Markov channel, the largest packet error rate of a retransmission with channel power gain is defined as
For the static channel, the largest packet error rate of a retransmission is defined as
Iii-C Problem Formulation
The sensor’s transmission control policy is defined as the sequence , where is the control action in time slot . In what follows, we optimize the sensor’s policy such that the long-term estimation error is minimized, i.e.,
It is possible that the long-term estimation error may never be bounded no matter how we choose the policy, if the channel quality is always bad or the dynamic process (1) changes rapidly. Therefore, it is also important to investigate the condition in terms of the transmission reliability and the dynamic process parameters, under which the remote estimation system can be stabilized, i.e., the long-term estimation MSE can be bounded.
Iv Optimal Policy: Static Channel
In this section, we investigate the optimal transmission control policy in static channel.
Iv-a MDP Formulation
From (9), (11) and (13), the estimation MSE and states and only depend on the previous action and states, i.e., , and . Therefore, the online decision problem (25) can be formulated as a discrete time Markov decision process (MDP) as follows.
1) The state space is defined as , where the number of transmission attempts, , should be no larger than the estimation quality indicator (i.e., the AoI), , from the definition. The state of the MDP at time is .
2) The action space is defined as . A policy is a mapping from states to actions, i.e., . Recall that the action at time , , indicates a new transmission or a retransmission .
3) The state transition function characterizes the probability that the state transits from state at time to at time with action at time . As the transition is time-homogeneous, we can drop the time index here. Let and denote the current and next state, respectively. Based on the packet error probability (20) and the state updating rules (11) and (13), we have the state transition function as
4) The one-stage (instantaneous) cost, i.e., the estimation MSE based on (9), is a function of the current state of :
which is independent of the action.
Therefore, the problem (25) is equivalent to solve the classical average cost optimization problem of the MDP. Assuming the existence of a stationary and deterministic optimal policy, we can effectively solve the MDP problem using standard methods such as the relative value iteration algorithm [17, Chapter 8].
Iv-B Optimal Policy: Condition of Existence
Since the cost function grows exponentially with the state , it is possible that the long-term average cost with a HARQ-based transmission control policy, , in the state space cannot be bounded, i.e., the remote estimation system is unstable. We give the following sufficient condition of the existence of an optimal policy that has a bounded long-term estimation MSE.
See Appendix A. ∎
From Theorem 1, it is clear that the optimal policy exists if we have a good channel condition and a good HARQ scheme, which guarantees high retransmission reliability (i.e., a small and hence a small ), or the dynamic process does not change quickly, which is easy to estimate (i.e., a small ).
Iv-C Optimal Policy: The Structure
We show that the optimal policy has a switching structure as follows.
The optimal policy of problem (25) is a switching-type policy, i.e., (i) if , then ; (ii) if , then , where is any positive integer.
See Appendix B. ∎
In other words, for the optimal policy, the two-dimensional state space is divided into two regions by a curve, and the decision actions of the states within each region are the same, as illustrated in Fig. 3.
Note that the switching structure can help save storage space for on-line implementation, since the smart sensor only needs to store switching-boundary states rather than the actions on the entire state space. At each time, the sensor simply needs to compare the current state with the boundary states to give the optimal decision.
Iv-D Optimal Policy: A Special Case
We consider the high SNR scenario, where retransmissions can have ultra-low packet error probabilities. Therefore, we assume that a retransmission is always successful in the high SNR scenario, and the optimal policy always exists from Theorem 1.
Due to the successful retransmissions, it can be noted from (26) that the states in with and are transient states. Also, since a successful retransmission must be followed by a new transmission, the states in with and are transient states, and the state has the action of new transmission. Furthermore, due to the switching structure of the optimal policy in Theorem 2, we set a policy-switching threshold for the states with , where the states choose the action of retransmission, while the states with choose the action of new transmission. Then, it is easy to see that the states with and are transient states. Finally, the countably infinite state space is reduced to a finite state space as illustrated in Fig. 3. Only the state has the action , and the other states have the action .
Therefore, is the key design parameter to be optimally designed. The policy optimization problem in the state space is transformed to the one-dimensional problem. By calculating the stationary distribution of the states in with a given , the average cost can be obtained, and we have the following result.
In the high SNR scenario, the minimum long-term average MSE of the static channel is given as
where is the optimal policy-switching threshold.
In Proposition 1, the optimal can be numerically obtained by linear search methods, yielding the minimum estimation MSE.
Iv-E Suboptimal Policy
The optimal policy of the MDP problem in Sec. IV-A does not have a closed-form expression for low-complexity computation. Besides, since the MDP problem has infinitely many states, it has to be approximated by a truncated MDP problem with finite states for numerical evaluation and solved offline. Therefore, we propose a easy-to-compute suboptimal policy, which is the myopic policy that makes decision simply to maximize the expected instantaneous cost.
Then, we have
Using the inequality (22), if and only if satisfies
Thus, we have the following result.
A suboptimal policy of problem (25) is
It can be proved that the suboptimal policy in Proposition 2 is also a switching-type policy. Due to the simplicity of the suboptimal policy, which, unlike the optimal policy, does not need any iteration for policy calculation, it can be applied as an on-line decision algorithm. In Sec. VI, we will show that the performance of the suboptimal policy is close to the optimal one for practical system parameters. The detailed computation-complexity analysis will be given in Sec. VI as well.
V Optimal Policy: Markov Channel
In this section, we investigate the sensor’s optimal transmission control policy in the Markov channel.
V-a MDP Formulation
We also formulate the problem as a MDP.
1) The state space is defined as .
2) The action space is defined as .
3) Let and denote the current and next state, respectively. The transition probability can be written as
4) The one-stage cost is given in (27).
V-B Optimal Policy: Condition of Existence
Inspired by the static channel scenario, we derive the following condition under which the long term average MSE can be bounded.
See Appendix C. ∎
It is interesting to see that when retransmissions have very high reliability, i.e., , the eigenvalues of the matrix approaches to zero, and thus the left-hand side of (35) is much less than one and the remote estimation system can be stabilized.
The stability regions of a two-state Markov channel in terms of and with different are illustrated in Fig. 5, where . We see that a larger results in a smaller stability region.
V-C Optimal Policy: The Structure
The optimal policy in the Markov channel also has a switching structure in the state space.
(i) if , then ;
(ii) if , then , where is any positive integer.
The proof is similar to that of Theorem 2 and is omitted due to the space limitation. ∎
V-D Optimal Policy: A Special Case
For the high SNR scenario, we assume that a retransmission is always successful. Thus, the state transition probability (34) does not depends on all the individual element of the historical channel-state vector , and we can simply combine the states in by as the state to reduce the state space.
Similar to the static channel scenario, the state space of the optimal policy can be further reduced as , and the optimal policy for states is , where , and the other states have the action , as illustrated in Fig. 5. Different from the static channel scenario, the optimal policy for the -state Markov channel has a set of parameters, i.e., , to be optimally designed.
We can reorder the three dimensional states as a state (column) vector, , and the states and are the th and th elements of , respectively. Using the state transition probability (34) and the transition rule of the special case as illustrated in Fig. 5, the matrix of the state transition probability can be written as
where the is the Kronecker product operator, is the th column of defined in (3), and is the
Based on the stochastic matrix (37), we can calculate the steady state distribution with a given set of policy-switching parameters. By numerically optimizing , we have the following result.
In the high SNR scenario, the minimum long-term average MSE of the Markov channel is given as
where and is a null-space vector of with non-negative values, and here is the by identity matrix.
Vi Numerical Results
Vi-a Delay-Optimal Policy: A Benchmark
We also consider a delay-optimal policy based on the HARQ protocol in , as the benchmark of the proposed optimal policy. We use the average AoI to measure the delay of the system. Therefore, similar to the MSE-optimization problem (25), the delay optimization problem is formulated as
This problem can also be converted to a MDP problem with the same state space, action space and state transition function as presented in Sec. IV-A. The one-stage cost in terms of delay is
Vi-B Simulation and Policy Comparison
In the remainder of the section, we present numerical results of the optimal policies for static and Markov channels in Sec. IV and Sec. V, respectively, and their performance. Also, we numerically compare these optimal policies with the benchmark policy in Sec. VI-A. Unless otherwise stated, we consider CC-HARQ, and we set dB, , , , , , , and thus , .
The packet error probabilities for a new transmission and a retransmission of the CC/IR-HARQ protocol are based on taking the approximation (5) into (14) and (17), respectively. We use the relative value iteration algorithm in a MDP toolbox in  to solve the MDP problems in Sections IV, V and VI-A.
Vi-B1 Static Channel
Policy Comparison. To solve the MDP problem with an infinite state space, the unbounded state space is truncated as to enable the evaluation. We set the channel power gain . Using Theorem 1, we can verify that the CC/IR-HARQ based optimal policy exists. Fig. 7 shows different policies within the truncated state space. In Fig. 7(a), we see that in line with Theorem 2, the optimal policy of CC-HARQ protocol is a switching-type one, where the actions of the states that are close to the states with , are equal to zero, i.e., new transmissions are required. Also, we see that the suboptimal policy plotted in Fig. 7(d) is a good approximation of the optimal one within the truncated state space. However, the delay-optimal policy plotted in Fig. 7(c) is very different from the previous ones, where more states have the action of new transmission. Therefore, HARQ-based retransmissions are more important to reduce the estimation MSE than the delay. Fig. 7(b) presents the optimal policy of the IR-HARQ protocol which is identical with that of CC-HARQ in Fig. 7(a). This is because when the channel power gain is high, e.g., , both IR- and CC-HARQ can provide sufficiently high retransmission reliability and the transmission control policy are the same. However, we can show that the optimal policy for CC- and IR-HARQ are different when the channel power gain is low and IR-HARQ can provide much better retransmission reliability. The policy diagram is not included due to the space limitation.
Performance Comparison. Based on the above numerically obtained polices and the policy with the standard ARQ, i.e., the one without retransmission (see Sec. III), we further evaluate their performances in terms of the long-term average MSE using (10). We run the remote estimation process with time slots and set the initial value of as . Also, we set as the performance baseline, as , .
Fig. 7 plots the average MSE versus the simulation time , using different transmission control policies. Our simulation shows that the conventional non-retransmission policy has an unbounded average MSE, which is not shown in the figure due to the ultra fast growth rate. However, we show that the average MSEs of different HARQ-based policies quickly converge to bounded steady state values. Therefore, the proposed HARQ-based policy can significantly improve the estimation quality against the conventional policy. Also, we see that the performance of the suboptimal policy is very close to the optimal one. Given the performance baseline, the optimal policy gives a MSE reduction of the delay-optimal policy, which demonstrates the superior performance of the proposed optimal policy.
Vi-B2 Markov Channel
We consider a two state Markov channel with the channel power gains and . The matrix of channel state transition probability Using Theorem 3, we can verify that the CC/IR-HARQ based optimal policy exists. To solve the MDP problem in the Markov channel scenario with an infinite state space, the unbounded state space is truncated as to enable the evaluation, where is the state vector of the historical channel states. Figs. 9 and 9 show the optimal transmission control policy under channel 1 () and 2 (), respectively. We can see the switching structure of the optimal policy. Also, we see that new transmission occurs more often in the good channel than in the bad channel.
We can also calculate the suboptimal policy and delay-optimal policy of the Markov channel, and the computation complexity of these policies together with the ones of the static channel are listed in Table I. We see that the numbers of convergence steps for calculating these policies are less than , and the optimal policy has a larger number of convergence steps than the delay-optimal policy and hence a higher computation complexity.
Performance Comparison. We evaluate the non-retransmission policy, CC/IR-HARQ based optimal policy, CC-HARQ based suboptimal policy and delay-optimal policy in terms of the long-term average MSE using (10). We run the remote estimation process with time slots and set the initial value of as .
Fig. 10 plots the average MSE versus the simulation time , using different transmission control policies. Our simulation shows that the non-retransmission policy has an unbounded average MSE. We show that the average MSEs of different policies quickly converge to bounded steady state values. Therefore, the proposed HARQ-based policy can also significantly improve the estimation quality against the conventional policy in the Markov channel scenario. We see that the performance of the suboptimal policy is very close to the optimal one. Given the performance baseline, the optimal policy reduces MSE by for the delay-optimal policy. Also, unlike the static channel scenario, we see that the IR-HARQ based optimal policy significantly reduces the average MSE by compared to the CC-HARQ based optimal policy. This is because IR-HARQ can provide much better retransmission reliability than CC-HARQ especially when the channel quality is bad.
|Computation complexity ||The number of convergence steps,|
We have proposed and optimized a HARQ-based remote estimation protocol for real-time applications. Our results have shown that the optimal policy can significantly reduce the estimation MSE for some practical settings. As the recent communication standards for real-time wireless control, such as WirelessHART, ISA-100 and IEEE 802.15.4e, have not adopted any HARQ techniques, this work also suggests that HARQ can be adopted by the future real-time communication standards to enhance the performance of mission-critical remote estimation/control systems.
Appendix A: Proof of Theorem 1
To prove the existence of a stationary and deterministic optimal policy given condition (28), we need to verify the following conditions [21, Corollary 7.5.10]: (CAV*1) there exists a standard policy such that the recurrent class induced by is equal to the whole state space ; (CAV*2) given , the set is finite.
Condition (CAV*2) can be easily verified based on (27). In what follows, we verify (CAV*1) by first constructing a policy and then proving that it is a standard policy.
The action of the policy is given as
It is easy to prove that any state in induced by is a recurrent state. We then prove that is a standard policy by verifying both the expected first passage cost and time from state to are bounded . Due to the space limitation, we only prove that any state with has bounded first passage cost and time. The other states can be proved similarly.
For notational simplicity, the expected first passage cost of the state is denoted as , and the one-stage cost (27) is rewritten as