Networked Control Systems over Correlated Wireless Fading Channels
In this paper, we consider a networked control system (NCS) in which an dynamic plant system is connected to a controller via a temporally correlated wireless fading channel. We focus on communication power design at the sensor to minimize a weighted average state estimation error at the remote controller subject to an average transmit power constraint of the sensor. The power control optimization problem is formulated as an infinite horizon average cost Markov decision process (MDP). We propose a novel continuous-time perturbation approach and derive an asymptotically optimal closed-form value function for the MDP. Under this approximation, we propose a low complexity dynamic power control solution which has an event-driven control structure. We also establish technical conditions for asymptotic optimality, and sufficient conditions for NCS stability under the proposed scheme.
Networked control systems (NCSs) have drawn great attention in recent years due to the growing applications in industrial automation, remote robotic control, etc. . A typical NCS consists of a plant, a sensor and a controller which are connected over a communication network as illustrated in Fig. 1. The presence of communication channels in the NCSs complicates the design and analysis due to the interactions between communication and control. Conventional closed-loop control theories  (e.g., stabilization, optimal control) must be reevaluated when considering communication constraints.
There are various works on the analysis and optimization of NCSs. In , , a necessary minimum rate requirement for NCS stability is computed for noiseless and memoryless Gaussian channels between the sensor and the controller. Many other works – consider encoder/decoder design and give sufficient conditions for NCS stability under scenario-specific communication channels. The authors in  design an encoder and decoder structure to achieve asymptotic stability for an NCS with a packet dropout channel. In , the authors study multi-input networked stabilization with a fading channel between the controller and plant. In , the authors give stability conditions for an NCS with fading packet dropout channels, where the evolution of the fading channel follows a Markov process with a finite discrete state space. In all these works –, the key focus is on achieving NCS stability, which is only a weak form of control performance. There are also many works considering optimal control of NCSs. In , the authors consider a static joint communication and control optimization and ignore the stochastic evolutions of the plant dynamics and the fading channel states. In , a joint scheduling and control policy is proposed to minimize the linear quadratic Gaussian (LQG) cost and the communication cost under a simplified static rate-limited error-free channel. In , , the authors consider either plant LQG control or sensor scheduling over a packet-dropping network with a constant symbol error rate (SER). In all these works –, the channel between the sensor and the controller is assumed to be either error-free or with a constant SER and they ignore the effect of how the power control scheme affects the SER, which further affects the state estimation at the remote controller.
To optimize the performance of the NCSs over wireless fading channels, the control policy should be adaptive to the plant state information and the fading channel state. The plant state realization reveals the relative importance of the individual state feedback, while the channel fading state reveals the transmission opportunities over the communication channels. In fact, the associated optimization problem belongs to the Markov decision process (MDP) problem, which is well-known to be quite challenging , . In , the authors study dynamic control of the transmission probability by minimizing the mean square error (MSE) of the plant state estimation and the average transmission probability for an NCS with an on-off switch channel. In –, the authors study the dynamic power control for an NCS with a wireless fading channel. Specifically,  solves the minimization of an average power cost subject to the stability requirement,  solves the minimization of the MSE of the plant state estimation and average power cost, and  solves the minimization of the LQG cost and average power cost. The MDP problems in – are solved using the conventional value iteration algorithm (VIA), which induces huge complexity and suffers from slow convergence and lack of insights , . The approaches therein cannot be used in our problem, where we target to obtain a low complexity dynamic control solution.
In this paper, we consider an NCS where a sensor delivers the plant state information to a controller over a temporally correlated wireless fading channel as illustrated in Fig. 1. Furthermore, we consider error-adaptive power control111In error-adaptive control, the data rate at the sensor is fixed, and the sensor dynamically adjusts the transmit power to adjust the SER  so as to achieve certain objectives of the NCS., where the instantaneous SER depends on the transmit power of the sensor. Using the separation principle between control and communication , , we focus on minimizing the state estimation cost of the plant subject to an average communication power constraint. The communication power optimization problem is formulated as an infinite horizon average cost MDP and there are several first order technical challenges:
Challenges due to the Temporal Correlations of the Fading Channel: When the wireless fading channel is temporally i.i.d., the dimension of the optimality equation can be reduced , which simplifies the numerical computation of the value function. However, such a dimension reduction technique cannot be applied when the wireless channel fading is temporally correlated. This poses great challenges even for obtaining the numerical solution of the associated optimality equation.
To address the above challenges, we propose a novel continuous-time perturbation approach and obtain an asymptotically optimal closed-form value function for solving the associated optimality condition of the MDP. Based on the structural properties of the communication power control, we show that the solution has an event-driven control structure. Specifically, the sensor either transmits with maximum power or shuts down depending on a dynamic threshold rule. Furthermore, we analyze the asymptotic optimality of the proposed scheme and also give sufficient conditions for ensuring the NCS stability while using the proposed scheme. Finally, we compare the proposed scheme with various state-of-the-art baselines and show that significant performance gains can be achieved with low complexity.
Notations: Bold font is used to denote matrices and vectors. and denote the transpose and conjugate transpose of matrix respectively. represents identity matrix with appropriate dimension. represents the largest eigenvalue of symmetric matrix . represents the Euclidean norm of a vector . represents the real part of . represents the absolute value of a scaler . denotes the column gradient vector with the -th element being . denotes the Hessian matrix of w.r.t. vector . as means .
Ii System Model
Fig. 1 shows a typical networked control system (NCS), which consists of a plant, a sensor, and a controller, and they form a closed-loop control. We consider a slotted system, where the time dimension is partitioned into decision slots indexed by with duration . The sensor has perfect state observation of the plant state . The controller is geographically separated from the sensor, and there is a temporally correlated wireless fading channel connecting them. At each time slot , the sensor observes the plant state and the pre-processor generates , which is passed to the mapper. The mapper maps to one of the quantization levels , which is encodes into -digit binary information bits at the encoder. The information bits are communicated to the remote controller over the wireless fading channel using quadrature amplitude modulation (MQAM). Specifically, the -digit binary information bits are mapped to one of the available QAM symbols for a given constellation type. After the demodulation process, the demodulator outputs binary information bits to the demapper, which maps the information bits back to one of the quantization levels . Then, is passed to the state estimator to obtain a state estimate . After that, is passed to the actuator to generate a control action . The actuator, which is co-located with the plant, uses control action for plant actuation.
Such an NCS with a wireless fading channel covers a lot of practical applications222Please refer to  and the reference therein for a more broad range of application scenarios.. For example, in a heating, ventilation and air conditioning (HVAC) system , the boiler and chiller components (i.e., the controller) of the HVAC system are mounted on the rooftop. The temperature and humidity sensors are located inside a room, which measure certain indoor environmental parameters and transmit the data to the controller over a wireless channel. The controller then decides whether to pump cool or hot air into the room through the ducts based on the received data.
Ii-a Linear Stochastic Plant Model
We consider a continuous-time stochastic plant system with dynamics , , , where is the plant state, is the plant control action, is the initial plant state, , , and is an additive plant disturbance with zero mean and covariance . Furthermore, we assume that the plant disturbance is bounded, i.e., for some , . Without loss of generality, we assume that is diagonal333For non-diagonal , we can pre-process the plant using the whitening transformation procedure . Specifically, let the eigenvalue decomposition of be , where is a unitary matrix and is diagonal. We have , where , , , and . Therefore, the optimization based on can be transformed to an equivalent problem based on with diagonal plant noise covariance.. Since the sensor in the NCS samples the plant state once per time slot (with duration ), the state dynamics of the sampled discrete-time stochastic plant system is given by 
for , where , , and is an i.i.d. random noise with zero mean and covariance . We have the following assumptions on the plant model:
Ii-B Wireless Fading Channel Model
We consider a continuous-time temporally correlated wireless fading channel with dynamics , , with , where is the channel state information (CSI) and is the initial channel state. The coefficient determines the temporal correlation of the fading process444The autocorrelation function is . and is an additive circularly-symmetric Gaussian noise with zero mean and unit variance555Specifically, and , where is the complex conjugate of .. Similarly, the state dynamics of the sampled discrete-time channel is given by 
for , where and is an i.i.d. noise with zero mean and covariance . The received signal at the demodulator of the controller is given by
where is the transmit SNR and is an i.i.d. Gaussian noise. Let denote the symbol error event (where means symbol error). In this paper, we consider rectangular MQAM constellation (e.g., , , ), and the associated symbol error rate (SER) is given by
where is the channel bandwidth and is a constant666The SER model in (4) covers other types of constellation geometry for MQAM (e.g., circular constellation) with appropriate adjustment of . In this paper, our derived results are based on the rectangular MQAM, which can be easily extended to other constellation types.. The received signal is processed in the demodulator, which outputs information bits to the reconstructor. The reconstructor maps the information bits back to one of the quantization levels and the associated output is given by
Note that is the the output of the mapper at the sensor. Furthermore, (5) means that for successful transmission and otherwise.
Ii-C Information Structures at the Sensor and the Controller
Let be the history of realizations of variable up to time . The available knowledge at the sensor and the controller at time are represented by the information structures and . Specifically, is given by
for , and , where we denote (). Moreover, is given by
for , and , where is the slot of the latest successful transmission by time . Note that at time slot , the sensor discards777The reason is that the events of the form contain information about the plant state through the dependence of the SER in (4). To avoid this complication, we discard the events of the form as in . the events of the form . We have the following observations on and :
Remark 1 (Observations on and ).
For , is the plant state, is the pre-processor output, is the plant control action, is the mapper output, and all can be locally obtained at the sensor. The information can be obtained by the feedback signals from the controller as shown in Fig. 1.
For , is the locally generated plant control action, can be locally measured using the pilots from the sensor , and are the symbol error indicators and the received signals, which are the output of the wireless fading channel and can be locally obtained at the controller.
There is an intersection of and , which is denoted as . Specifically, for , and . ∎
Ii-D Communication Power and Plant Control Policies
Based on and , we define the communication power and plant control policies. Let be the minimal -algebra containing the set and be the associated filtration at the sensor. Similarly, define and let be the associated filtration at the controller. At the beginning of time slot , the sensor determines the power control action and the controller determines the plant control action according to the following control policies:
Definition 1 (Plant Control Policy).
A plant control policy at the controller is -adapted, meaning that is adaptive to all the information up to time slot (i.e., ). ∎
Definition 2 (Communication Power Control Policy).
A communication power control policy at the sensor is -adapted, meaning that is adaptive to all the information up to time slot (i.e., ). Furthermore, satisfies the peak power constraint, i.e., for all , where is the maximum power the sensor can use at each time slot. ∎
Remark 2 (Interpretation of the Power Control Policy).
The power control policy in Definition 2 is an error-adaptive power control. In such strategy, the rate of the channel is fixed, which means that the constellation size of the MQAM (where ) is unchanged during the communication session. At each time slot , the sensor controls the communication power to dynamically adjust the SER in (4), so that the state estimation error at the controller is adjusted. Hence, there is an inherent tradeoff between the plant performance (in the stability or optimal control sense) and the communication cost (in terms of the average transmit power). We shall quantify this tradeoff in the following sections. ∎
Iii Communication Power Problem Formulation
In this section, we first introduce a primitive quantizer at the sensor. We then establish the no dual effect property under such a quantizer and give the optimal plant control policy w.r.t. the joint communication power and plant control problem. Based on the primitive quantizer and the CE controller, we formally formulate the communication power problem.
Iii-a Primitive Quantizer
Following  and , we adopt a primitive quantizer at the sensor to track the dynamic range of the plant state. Specifically, the primitive quantizer is characterized by four parameters , where is the shifting vector888 also measures the common information at both the sensor and the controller. Note that is calculated based on , and hence, can be locally maintained at both the sensor and the controller according to the discussions in Remark 1., is the coordinate transformation matrix, is the dynamic range, and is the rate vector with (where determines the quantization level of the -th element of the plant state ).
The primitive quantizer consists of three components, i.e., a pre-processor, a mapper and an encoder. Specifically, the pre-processor takes as input and generates the innovation , which is passed to the mapper. The mapper maps to one of the quantization levels within the region and outputs . Then is encoded into -digit binary information bits. Please refer to Appendix A on how the primitive quantizer works in detail. We summarize the property of the primitive quantizer as follows:
Lemma 1 (Properties of the Primitive Quantizer).
The primitive quantizer tracks the dynamic range of the plant state , i.e.,
The equivalent model between input and mapper output can be expressed as
where is the quantization noise, and for each , is uniformly distributed within the region . ∎
Therefore, according to (8), we can use the primitive quantizer to track the dynamic range of the plant state .
Iii-B Certainty Equivalent Controller
where and are positive definite symmetric weighting matrices for the plant state deviation cost and the plant control cost, and is the communication power price. In general, the design of the communication power policy and the plant control policy are coupled together. This is because the communication power will affect the state estimation accuracy at the controller, which will in turn affect the plant state evolution. However, by establishing the no dual effect property (e.g., , ), we can obtain the optimal plant control policy for the above joint optimization problem. Specifically, let be the plant state estimate at the controller and be the state estimation error. The no dual effect property is established as follows:
Lemma 2 (No Dual Effect Property).
Under the primitive quantizer in Section III-A, we have the following no dual effect property in our NCS:
please refer to Appendix B. ∎
where is the feedback gain matrix, and satisfies the following discrete time algebraic Riccati equation999We assume that is observable as in the classical LQG control theories. This assumption together with Assumption 1 ensures that the DARE has a unique symmetric positive semidefinite solution . (DARE): .
Iii-C Communication Power Problem Formulation
Under the primitive quantizer and the CE controller, the per-stage state estimation error and communication power cost is given by
where is a positive definite symmetric weighting matrix. We consider the following communication power optimization problem:
(Communication Power Optimization Problem):
where has the following dynamics:
where is the quantization noise in (9). ∎
We need to design a communication power control policy such that the plant system and the primitive are stable, and state estimation error is bounded. Specifically, we have the following definition on the admissible communication power control policy of the NCS under the CE controller in (10):
(Admissible Communication Power Control Policy): A communication power control policy of the NCS is admissible if,
The plant state process is stable in the sense that , where means taking expectation w.r.t. the probability measure induced by .
The dynamic range of the quantizer is stable in the sense that .
The process at the sensor is stable in the sense that . ∎
Problem 1 is an MDP and we show in Appendix C that it is without loss of optimality that we restrict the system state to be with transition kernel . Using dynamic programming theories , the optimality conditions of Problem 1 are given as follows:
Theorem 1 (Sufficient Conditions for Optimality).
If there exists that satisfies the following optimality equation (Bellman equation):
and for all admissible communication power control policies , satisfies the following transversality condition:
Then, we have the following results:
is the optimal average cost of Problem 1.
please refer to Appendix C. ∎
Unfortunately, the Bellman equation in (14) is very difficult to solve because it involves a huge number of fixed point equations w.r.t. . Numerical solutions such as numerical VIA ,  have exponential complexity101010Since the state space of is continuous, the numerical VIA refers to the finite difference method for solving an equivalent discretized Bellman equation ,  using the conventional VIA. Suppose the state space of each element in is discretized into intervals. Then the cardinality of is . w.r.t. , where is the dimension of the plant state ).
In Section IV, we shall derive a closed-form approximation for using continuous-time perturbation techniques.
Iv Low Complexity Power Control Solution
In this section, we first adopt a continuous-time perturbation approach to analyze the difference between a closed-form approximate value function and the optimal value function. We analyze the performance gap between the policy obtained from the continuous-time perturbation approach and the optimal control policy. Then, we focus on deriving the closed-form approximate value function and proposing a low complexity power control scheme. We also give sufficient conditions for the NCS stability.
Iv-a Continuous-Time Approximation
(Perturbation Analysis for Solving the Optimality Equation): If there exists where111111 means that is second order differentiable w.r.t. to each variable in . that satisfies
the following multi-dimensional PDE:
Then, for any ,
as , where , and and are the error terms due to the continuous time approximation and the quantization, respectively. Furthermore, satisfies the transversality condition in (15).
please refer to Appendix D. ∎
Let be the control policy, under which the generated control action achieves the minimization in the PDE in (16) for any . Let be the associated performance. The gap between and the optimal average cost in (14) is established as follows:
Theorem 2 (Performance Gap between and ).
If and is admissible, we have
Please refer to Appendix E. ∎
Theorem 2 suggests that as . In other words, the power control policy is asymptotically optimal for sufficiently small .
Iv-B Closed-Form Approximate Value Function
In this subsection, we solve the PDE in (16) to obtain the closed-form approximate value function . It can be observed that the PDE in (16) is multi-dimensional and coupled in the variables , and it is quite challenging to obtain the closed-form solution. In the following Lemma, we derive an asymptotic solution of the PDE using the asymptotic expansion technique .
Lemma 4 (Asymptotic Solution of the PDE).
please refer to Appendix F. ∎
Iv-C Structural Properties of the Low Complexity Power Control
Theorem 3 (Structural Properties of Power Control Policy).
The optimizing power control policy that minimize the R.H.S. of the PDE in (16) is given by
It can be observed that the power control policy has an event-driven control structure with a dynamically changing threshold . Specifically, the sensor either transmits using the maximum power or shots down depending whether the dynamic threshold is larger than or not. Fig. 2(a) illustrates a sample path of the state estimation error and it can be observed that the sensor only activates transmission when the accumulated error is large enough. Furthermore, the dynamic threshold is adaptive to the plant state estimation error and the CSI . Using the approximate value function in (21), we have the following discussions121212Please refer to Appendix G on the order growth results in Remark 3. on the dynamic threshold:
Remark 3 (Properties of the Dynamic Threshold).
The dynamic threshold is affected by the following factors:
Dynamic Threshold w.r.t. : For given CSI and data rate , the dynamic threshold increases w.r.t. the state estimation error at the order of . This means that large state estimation error tends to use full power. This is reasonable because large state estimation error means the urgency of delivering information to the controller, which leads to use large power.
Dynamic Threshold w.r.t. : For given state estimation error and data rate , the dynamic threshold increases w.r.t. the CSI at the order of . This means that large CSI tends to use full power. Note that large means good transmission opportunities. Hence, it is reasonable to use more power to reduce the SER.
Dynamic Threshold w.r.t. : For given large CSI or large state estimation error , the dynamic threshold increases w.r.t. at the order of . This means that large data rate tends to use full power. This is reasonable because large data rate leads to high SER131313Large leads to the decrease of the minimal distance between the transmitted symbols in the constellation diagram, which results in an increase in the SER ., which also leads to use large power to increase the chance of using . ∎
Fig. 2(b) illustrates the decision region for and for given system parameter configurations.
Iv-D Stability Conditions and Performance Gaps
Theorem 4 (Sufficient Conditions for NCS Stability).
please refer to Appendix H. ∎
Remark 4 (Discussions of the Stability Conditions).
Theorem 4 gives the conditions to ensure NSC stability. Under the conditions, we have , where is the instability measure ,  of the plant system. Note that the term is equivalent to , where is the average successful transmission probability under . Hence, the condition in (23) is equivalent to , where is the average number of bits that are successfully delivered to the controller per channel use. Therefore, our sufficient condition is consistent with the classical results for error-free channels ,  after accounting for the SER. ∎
Finally, the approximate value function in (21) satisfies . Based on Theorem 2, the performance of the NCS under in (22) (i.e., ) is order-optimal, i.e., as . We give the necessary conditions for NCS stability as follows.
Theorem 5 (Necessary Conditions for NCS Stability).
please refer to Appendix I. ∎
From Theorem 4 and Theorem 5, the tightness of the sufficient condition compared with the necessary condition depends on the difference between and . For NCSs with , the sufficient condition is tight compared with the necessary condition.
Remark 5 (Extension to Plant Output Feedback).
The solution framework can be easily extended to the case of plant output feedback, i.e., the input of the sensor is , where and . Using Prop. 5.1 of , the encoder has access to a Luenberger-like observer , where is chosen such that is stable. Let and it can be shown141414Please refer to Appendix J on the related proofs regarding the extension. that for some constant . Therefore, we can obtain an upper bound of the per-stage state estimation error cost in (11) as follows:
where . Since the sensor cannot observe perfect plant state, the state estimation error is not available at the sensor. Therefore, instead of optimizing the average state estimation error cost , we optimize the following upper bound based on (25):
Furthermore, we write the dynamics of as follows:
where can be treated as the disturbance for , and it is bounded since both and are bounded. Therefore, the optimization problem for plant output feedback fits into the proposed framework, where the average state estimation cost in (26) corresponds to (12), and the dynamics in (27) with bounded disturbance correspond to (1). ∎
In this section, we compare the performance of the proposed power control scheme in Theorem 3 with four baselines. Baseline 1 refer to a fixed power control (FPC), where for a fixed power . Baseline 2 refer to a CSI-only power control (COPC), where where is the CSI estimation based on the previous-slot CSI and is a tradeoff parameter. Baseline 3 refer to a power control for error-free channel (PCEFC) , where the sensor minimizes an average weighted state estimation error and an average number of channel uses under error-free channel. Baseline 4 refers to a power control for I.I.D. channel with special information structure (PCICSIS) , where