Co-design of Safe and Efficient Networked Control Systems in Factory Automation \replacedwith State-dependent Wireless Fading Channels: A Constrained Cooperative Game Approach

Co-design of Safe and Efficient Networked Control Systems in Factory Automation \replacedwith State-dependent Wireless Fading Channels: A Constrained Cooperative Game Approach

Bin Hu, Yebin Wang, Philip Orlik, Toshiaki Koike-Akino and Jianlin Guo Yebin Wang, Philip Orlik, Toshiaki Koike-Akino and Jianlin Guo are with Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA 02139, USA. yebinwang, porlik, koike, guo@merl.comThis work was performed during Bin Hu’s internship at MERL. bhu@odu.edu
Abstract

In factory automation, heterogeneous manufacturing processes need to be coordinated over wireless networks to achieve safety and efficiency. These wireless networks, however, are inherently unreliable due to shadow fading induced by the physical motion of the machinery. To assure both safety and efficiency, this paper proposes a state-dependent channel model that captures the interaction between the physical and communication systems. By adopting this channel model, sufficient conditions on the maximum allowable transmission interval are then derived to ensure stochastic safety for a nonlinear physical system controlled over a state-dependent wireless fading channel. Under these sufficient conditions, the safety and efficiency co-design problem is formulated as a constrained cooperative game, whose equilibria represent optimal control and transmission power policies that minimize a discounted joint-cost in an infinite horizon. This paper shows that the equilibria of the constrained game are solutions to a non-convex generalized geometric program, which are approximated by solving two convex programs. The optimality gap is quantified as a function of the size of the approximation region in convex programs, and asymptotically converges to zero by adopting a branch-bound algorithm. Simulation results of a networked robotic arm and a forklift truck are presented to verify the proposed co-design method.

Co-design method, shadow fading, stochastic safety, factory automation, networked control system.

I Introduction

I-a Background and Motivation

Factory Automation Networks (FANs) are Cyber-Physical Systems (CPS) consisting of numerous heterogeneous manufacturing processes that coordinate with each other by exchanging information over wireless networks [1, 2, 3]. FANs have received considerable attention due to the rapid development of wireless communication technologies, which provides efficient and cost-effective service such as increased mobility, easy scalability and maintenance for applications like automated assembly systems in manufacturing factories [4]. In many safety-critical applications, safety is always of primary concern in FANs. However, building safe and efficient FANs is challenging in two aspects. First, from a system modeling standpoint, the heterogeneous nature of FANs requires a hybrid framework that can capture system dynamics in different levels as well as their mutual interactions. Assessing the performance and safety of this “hybrid” system as a whole demands different modeling and analysis tools. Secondly, the wireless network in FANs is inherently unreliable due to channel fading [5, 3] or interference [6] caused by internal system states or external environments, such as obstacles or physical motions of machinery. The fading channel inevitably results in a severe drop in the network’s quality of service (QoS) and thereby introduces a great deal of stochastic uncertainties in FANs that may cause serious safety issues. The objective of this paper is to develop a co-design paradigm for communication and control systems under which a certain level of safety and efficiency can be achieved for FANs in the presence of shadow fading.

Assuring safety for FANs often requires joint coordination from heterogeneous systems which may have different objectives. Such a coordination is necessary due to the interactions among the heterogeneous systems. \addedSuch interactions exist in many industrial applications, to name a few, manufacturing systems with heavy facilities mills and cranes discussed in [7], sensor network with moving robots [8] and indoor wireless networks with moving human bodies [9]. One typical example in FANs \deletedshown in Figure 1 is an assembly process where an autonomous assembly arm and a forklift truck collaborate to assemble products. On the one hand, the control objective of an autonomous assembly arm is to track a specified trajectory by exchanging information between a physical plant and a remote controller via wireless networks. On the other hand, the objective of the forklift system is often related to accomplishing some high-level tasks, such as transporting assembled products from one workstation to another. These physically separated systems, however, may have strong cyber-physical couplings. The cyber-physical couplings in the systems of networked assembly arm and forklift trucks comes from the fact that the physical motion of forklift vehicle may lead to serious shadow fading in the wireless network that is used by the assembly arm, thereby significantly affecting the system stability and performance. Thus, to ensure system safety for FANs, one must explicitly examine such cyber-physical couplings in communication channels.

The channel model that is used to characterize the shadow fading in FANs, must be carefully examined. As a type of channel fading, shadow fading is often characterized in terms of the channel gain. Traditionally, the channel gains are modeled either as independent identical distributed (i.i.d.) random processes [6, 10, 11, 12] with assumed distributions such as Rayleigh, Rician and Weibull or as Markov chains [13, 14]. These channel models are inadequate to characterize the cyber-physical couplings in FANs due to the fact that the network state is assumed to be independent from physical states in either i.i.d. or Markov chain models. With such independency, control and communication could be considered separately through the application of a separation principle [10]. This separation-principle, may be valid for networked system where the network states are independent of physical dynamics, but is clearly inappropriate for FANs where the channel state is functionally dependent on the physical states. This dependency of channel states on physical states motivates the development of a new co-design paradigm under which the communication and control policies are coordinated to achieve both system safety and efficiency.

I-B Related Work

The example of an assembly process \replacedas well as the research work in [7, 8, 9, 15, 16] haveshown in Figure 1 has demonstrated the importance of considering the cyber-physical couplings between communication and control systems in assuring system safety and efficiency for FANs. Similar conclusions have also \replacedbeenbe made in prior work [17, 18] where the dependency of channel states on physical states is used in the design of distributed switching control strategy to assure vehicle safety in vehicular networked systems. This paper expands the results in [18] to show that both system safety and efficiency can be achieved via a novel co-design framework. Other than these papers, we are aware of no other work formally analyzing both the system safety and efficiency in the presence of such cyber-physical couplings. There is, however, a great deal of related work on the co-design of communication and control systems assuming the channel states are independent of physical states. We will review these results and discuss their relationships to the work in this paper.

From a communication perspective, the impact of channel fading on the system performance can be mitigated by increasing the transmission power. This observation motivates much research on the design of optimal power strategy to achieve various objectives in both communication [19, 20] and control communities [10, 8, 21]. The objective of power control in the communication community mainly focuses on improving the communication reliability and performance in an average or asymptotic sense. In [19, 20] and relevant references therein, an adaptive power strategy combined with adaptive data-rate strategies was developed to achieve Shannon limit for fading channels. The optimal power strategy was shown to be a function of the channel gain.

The objective of power control in the control community, however, is more concerned with how the communication quality affects the system stability and performance. As shown in [22, 23], such impact is often related to the unstable modes of the dynamics in physical systems and the QoS that could be delivered by a given wireless network. The power control strategy in networked control systems is often designed to ensure a certain level of QoS under which the closed-loop system is stable. In [8, 21], sufficient conditions on the transmission power were established to ensure exponentially bounded performance for state estimation of discrete linear time-varying systems.

When considering a joint objective for the communication and control systems, recent work in [24, 10, 25, 26] showed that the certainty equivalence property holds for the optimal control policy while the optimal communication policy was adapted to the channel states and physical states. In particular, [24] showed that the joint optimization of scheduling and control can be separated into the subproblems of an optimal regulator, estimator and scheduling. Similar ideas were applied to a joint design of controller and routing redundancy over a wireless network [25]. The work in [10] considered a co-design problem for optimal control and transmission power policies for a stochastic discrete linear system controlled over a fading channel. Their results showed that the optimal control policy was a standard LQR controller while the optimal power policy was adapted to both channel and plant states. This similar structure was also discovered in a joint design problem for an optimal encoder and controller over noisy channels [26].

All of the above studies, however, were developed by assuming a state-independent channel model. From a safety standpoint, this state-independent channel model is often obtained by assuming the worst impact that the physical state can have on the network. As a result, the selected communication policy (transmission power, data rate, or scheduling) may be greater than necessary to assure the same level of performance that can be obtained by using state-dependent channel model. In other words, the conservativeness on the selection of state-independent channel model may prevent the system as a whole from achieving system efficiency.

I-C Contribution

Motivated by the cyber-physical couplings in heterogeneous industrial systems, this paper develops a co-design paradigm to achieve both system safety and efficiency in the presence of shadow fading. The heterogeneous industrial systems are characterized by a nonlinear networked control system and a Markov decision process, which can represent a variety of realistic situations in industrial applications [7, 8, 9, 15, 16]. Under this heterogeneous system framework, the first contribution of this paper is the proposal of a novel state-dependent fading channel model that captures the impact of the physical states on the channel state. \addedFurthermore, this paper shows that the state-dependent channel model is a Markov modulated Bernoulli process [27] that generalizes the traditional i.i.d. Bernoulli channel model in two important aspects: (1) the model parameters are not constants and are stochastic processes due to their dependence on a randomly changing environment; (2) the channel parameters can be controlled by taking advantage of the cyber-physical couplings between communication and control systems.

Under the state-dependent channel model, the safety issue is examined in a stochastic setting by investigating the likelihood of the system states entering a forbidden or unsafe region. Thus, the second contribution of this paper is the sufficient condition on the maximum allowable transmission interval (MATI) under which the wireless networked system with state-dependent fading channels is stochastically safe. \addedWe also show that the MATI derived in this paper generalizes the well known results in [28] where the channel fading impact was not considered. To the best of our knowledge, the sufficient conditions presented in this paper are the first results on MATI that guarantee the stochastic safety under the state-dependent fading channels.

Under these safety conditions, the third contribution of this paper is the proposal of a new co-design paradigm to assure both safety and efficiency for FANs. In particular, we show that this safety-efficiency co-design can be formulated as a constrained two-player cooperative game. The equilibrium points of the constrained cooperative game represent optimal control and transmission power policies that minimize a discounted joint-cost induced by power consumption and control efforts in infinite horizon. The equilibrium of this constrained cooperative game can be obtained by solving a non-convex generalized geometric program (GGP) [29, 30]. To address the non-convexity of the GGP, this paper approximates the non-convex GGP with two relaxed convex GGPs that provide upper and lower bounds on the optimal solution. These bounds are shown to asymptotically approach the global optimum by using a branch-bound algorithm.

This paper is organized as follows. Section II describes the system model and problem formulation. Section III presents the sufficient conditions to ensure stochastic safety. Under the safety conditions, Section IV proposes a co-design paradigm to assure both safety and efficiency. The optimal solutions for the co-design problem are provided in Section IV-A. The main results are demonstrated via simulations of a mechanical robotic arm and a forklift truck in Sections V. Section VI concludes the paper.

Notations. Throughout the paper the -dimensional Euclidean vector space is denoted by and the non-negative reals and integers are denoted as and , respectively. The infinity norms of the vector and the matrix are denoted by and respectively. The right limit value of a function at time is denoted by . Given a time interval with , the essential supremum of a function over the time interval is denoted by where is the Euclidean norm. A function is essentially ultimately bounded if , . A function is a class function if it is continuous and strictly increasing, and . A function is a class function if it is in class and radially unbounded. A function is a class function if is a class function for each fixed and for each as . \addedThe function is said to be of class Exp- if there exist such that . A function is said to be of class  (), if for each , and .

Ii System Model: A Heterogeneous System Framework

\added

Fig. 1 shows a heterogeneous system framework with two subsystems. One is a networked control system () that characterizes a nonlinear physical system being controlled over a wireless network. The other one is a Markov Decision Process (MDP) () that models stochastic high level dynamics of a moving object in industrial systems.

\added

The cyber-physical coupling within this heterogeneous framework is due to the fact that the physical states (e.g., locations) of the moving object modeled by MDP’s states may lead to shadow fading on the wireless channel that is used by the networked control system. Such a coupling has been shown to be critical for performance guarantee in a variety of realistic situations in industrial applications, to name a few, such as robotic arms and forklift trucks, heavy facilities mills and cranes [7], sensor network with moving robots [8, 16] and indoor wireless networks with moving human bodies [9]. Under such industrial settings, the radio channel characteristics are non-stationary and may experience abrupt changes due to the motion of the moving object. Such state-dependent property of these wireless communications in industrial systems clearly invalidates the use of traditional co-design frameworks, such as [24, 10, 31], that rely on the assumption that the channel states are decoupled from the physical states. The heterogeneous system framework depicted in Fig. 1 is thus motivated by the co-design challenge under state-dependent fading channels.

\deleted

Figure 1 shows a heterogeneous system framework with two subsystems. One is a networked control system () that characterizes a physical robotic manipulator being controlled over a wireless network. The other one is a Markov Decision Process (MDP) () that models the high transitions of a forklift truck. The MDP’s states may lead to shadow fading on the wireless channel that is used by the networked control system. This cyber-physical coupling is modeled by a state dependent channel model in Section II-C.

Fig. 1: Heterogeneous System Framework: Networked Control System and Markov Decision Process

Ii-a The System Model

The dynamics of the system are modeled as follows,

where and are the physical states and measurements, respectively. and are the internal state and output for the remote controller, respectively. is the external disturbance that is assumed to be essentially ultimately bounded, i.e., , . , , and are Lipschitz functions for the physical plant and remote controller respectively. Without loss of generality, we assume the origin is the unique equilibrium for system , i.e. .

Let denote an increasing sequence of time instants where for all . Let be a transmission power set including power levels where is the power level. As shown in Figure 1, the measurement and controller output are sampled and transmitted over an unreliable communication channel with a selected power level at time instant . The wireless network is subject to fading and randomly drops the sampled information at each time instant. Let denote a binary random process taking value from . The value of the process at the th consecutive sampling instant indicates whether or not a packet dropout has occurred. In particular,

Let and denote the estimates of the corresponding variables at time instant . Note that we assume the time used for communication and computing control action is negligible compared to the sampling time interval and the network condition is unchanged during this small time interval. The estimation error induced by the communication during the sampling time interval is defined as and . Let denote the aggregated estimation error at time . After the information is successfully received, this aggregated estimation error will be reset to zero. Let denote the real time immediately after the sampling instant, . The estimation error will be reset to zero immediately after each successful transmission. So we may formally express as . Let denote the aggregated state for the closed loop system , and then one has the following equivalent system representation in terms of and ,

(1)

where

Note that we further assume that the functions and are continuously differentiable and thus the function in (1) is well defined. Since the (set) stability of the system implies the (set) stability of the system , we will only discuss the stability of the system in the remaining of this paper.

Ii-B The System Model

The system is modeled by an MDP process. An MDP is defined by a five tuple , where

  • is the state space for the MDP.

  • is the set of initial states.

  • is the action set.

  • is the transition probability , i.e. .

  • is the reward function.

Unlike system that models low level physical dynamics, the MDP process is used to model discrete-event decision making processes managing high-level control objectives such as transporting products from one location to another with minimum time or energy. \addedThe state space in the MDP system corresponds to a finite number of partitioned regions that the vehicle system, such as forklift trucks or cranes [7] or robots [8], can operate by taking actions from an action set . The transition probability matrix is used to model the stochastic uncertainties caused by sensor or actuation noises when the actions are physically implemented. The costs in the MDP model are defined to characterize the high level control objectives for the vehicle system. For instance, if the control objective is to transport the products to a target region, then small costs will be assigned in the minimization optimization problem, to the situation when the vehicle is transitted to the target region.

Ii-C State Dependent Dropout Channel Model

As shown in Fig. 1, the wireless channel used by the networked control system is functionally dependent on the state of the MDP system. This relationship corresponds to the situation that vehicle’s physical positions directly lead to \deletedthe shadow fading, thereby generating a great deal of stochastic uncertainties in system . Equation (1) shows that the stochastic uncertainty in system is governed by a binary random process , which characterizes the stochastic variations in channel conditions.

The state-dependency in the shadow fading channel is captured by a novel State-Dependent Dropout Channel (SDDC) model that is formally defined as follows.

Definition II.1.

Given a binary random process , an MDP system and a transmission power set , the wireless channel is SDDC if

(2)

where is the outage probability [6] that monotonically decreases with respect to the transmission power level .

Remark II.2.
\added

The definition of the SDDC is closely related to the outage probability, which is a widely used performance metric for fading channels [6]. It characterizes the likelihood of the Signal-to-Noise Ratio (SNR) \addedbeing below a specified threshold , i.e. . The difference between the SDDC model and traditional outage probability lies in the state-dependent feature of (2) where the probability is defined for each each MDP state (partitioned region). The probability defined in (2) can be obtained by measuring the SNR for each MDP state, see [9, 7] and reference therein for details about the statistical methods. In practice, the transmitter can estimate the probability by either directly using the visual sensor to observe the positions of the controlled moving object, or using the estimation techniques discussed in [7, 15]. See Example II.4 for more details about how to construct the SDDC from the outage probability.

Remark II.3.
\added

The SDDC model in (2) relates the channel state (packet dropout probability) to the MDP states and transmission power levels. From a control standpoint, this correlation enables that the channel conditions can be controlled by designing different control and transmission power strategies. By using such a freedom in the channel model, this paper develops a co-design framework that coordinates control and communication strategies to achieve both safety and efficiency for the entire heterogeneous system. The co-design idea of using the state-dependent channel model distinguishes our work from other results, such as [8, 9, 7, 16] where the channel state is assumed a fixed and uncontrollable random process.

Example II.4 (SDDC model with Raleigh fading).

Channel fading is often the result of the superimposition of signal attenuation in both large (shadowing) and small scale levels [6]. Let denote the small scale fading gain induced by multi-path propagation at time instant . Suppose is an i.i.d process that satisfies a Raleigh distribution with a scale parameter , i.e. . Let denote a shadow level function that characterizes the level of shadowing effect on the channel gain for each MDP state, i.e. . Thus, the state dependent channel gain is , and for a given transmission power level and noise power , the SNR is . With the assumption that the small scale fading gain is conditionally independent on shadowing state , for a given SNR threshold , one has

Then, we have the explicit function form for SDDC model.

The SDDC in (2) characterizes a cyber-physical coupling between the networked control system and the MDP system . In the presence of such coupling, the first objective of this paper is to find conditions under which system achieves \replacedstochastic safety that is formally defined as belows.almost sure asymptotic stability without external disturbance () and stochastic stability in probability with essentially ultimated bounded disturbance. These stability notions are formally defined as follows.

Definition II.5 (Stochastic Safety).

Consider the networked control system in (1) and the SDDC model in (2), let with denote a safe set for system, and denote the initial state of the networked control system,

  • The system with is asymptotically safe in expectation with respect to , if , there exists a class function such that

    (3)

    and thereby .

  • The system with is asymptotically bounded in expectation with respect to , if , there exists a class function and a class function such that

    (4)

    and .

  • The system with is almost surely asymptotically safe with respect to , if and , there exists a class function such that

    (5)

    and .

  • The system with is stochastically safe in probability with respect to , if , there exists a class function such that

    (6)
Remark II.6.

The safety notions E1 and E2 are concerned with system behavior on average (in the first moment) while the safety notions P1 and P2 focus on the specification on the sample path of the system. Note that these two types of safety definitions specify both the system’s transient and steady behavior. For systems without external disturbance, the safety definition E1 requires that the first moment of the norm of the system trajectories must asymptotically converge to the origin if the initial states start within the safety set while the almost sure asymptotic safety definition P1 is a stronger safety notion than the definition E1 in the sense that it requires almost all sample paths starting from the safety set stay in the safe region with probability asymptotically going to one. For systems with \replacednon-vanishingnon-varnishing but ultimately bounded disturbance, the definition E2 requires that the first moment of the system trajectories is asymptotically bounded with its bound depending on the magnitude of external disturbance. The safety notion P2 basically means that the probability of sample paths of the system leaving the safe region is asymptotically bounded and the probability bound is a function of the size of the external disturbance and safety region. These safety notions are closely related to the concepts of stochastic stability defined in [32, 33].

Under the safety conditions for system , the second objective of this paper is to seek optimal control and communication policies to achieve system efficiency for both system and . A control policy for the MDP system is an infinite sequence where is the decision made at time instant . The decision making is defined as a probability distribution over the action set given the history information, i.e., . Similarly, a power policy for system can be defined as with . The policy is stationary if  () with  (), and . \addedThis paper will focus on the stationary policy space.

With the definitions of control and communication policies, the system efficiency is defined as a constrained infinite horizon optimization problem as follows,

(7)
s.t.

where is the power cost and is the cost defined in the MDP system. \replaced is the discounted factor that provides a weight between short term rewards and rewards that might be obtained in a more distance future. is a parameter used to adjust the weight between communication and control costs. and are the weight and discounted factors respectively.

Iii Stochastic Safety

This section presents sufficient conditions to ensure stochastic safety defined in Definition II.5 for the system. The following two assumptions are \replacedneededneedced for the main results.

Assumption III.1.

The system is input to state stable  \added(ISS) w.r.t. and , i.e. there exist a class function , a class function and a positive real such that and is a concave function for any fixed . \addedThe system is exponential input to state stable (Exp-ISS) w.r.t. and , if is a class Exp- function and is a linear function with .

Assumption III.2.

There exists a Lyapunov function and for the estimation error dynamics in system (1) such that

(8)
(9)

Assumption III.2 basically requires that the estimation error is exponentially bounded and the couplings of in the error dynamics are linear. The following proposition shows that for a given transmission time sequence , the estimation error forms a stochastic jump process whose jump size is and depends on the MDP’s state and the transmission power level .

Proposition III.3.

Consider a random dropout process associated with the channel’s SDDC model in (2) and let denote the transmission time sequence. Let be a Lyapunov function for the error dynamic system in (1), then one has

(10)

where the conditional expectation operator is taken with respect to the random process .

Proof:

The proof is easily completed by combining and the SDDC model in (2). ∎

Under a state dependent shadow fading channel, the following theorem present\addeds a sufficient condition on the Maximum Allowable Transmission Interval (MATI) under which the system achieves almost sure asymptotic safety. In particular, we show that the MATI is a function of the control () and transmission power () policies.

Theorem III.4.

Let denote the transmission time interval, denote the transition matrix defined in (12) and denote the transmission power level. Suppose \replacedthe ISS assumption in Assumption III.1 Assumptions III.1 and \addedAssumption III.2 hold, for a given stationary control policy and a given stationary transmission power policy , the system with is \replacedasymptotically safe in expectation (asymptotically stable in expectation)almost surely safe (E1 in Definition II.5) with respect to the origin, if where

(11)

is the MATI. The system parameters , and come from (8) and (9) respectively and

(12)

with .

Proof.

The proof is provided in Appendix VI. ∎

Remark III.5.

The MATI in (11) generalizes the result in [28]. In particular, one can see that the MATI in [28] is recovered if the shadow fading is absent, i.e., . \deletedTheorem III.4 also holds for non-stationary policies with the transition matrix being time varying.

Theorem III.6.

Let the hypothesis in Theorem III.4 and \addedthe Exp-ISS assumption in Assumption III.1 hold \deletedand suppose there exists such that , then the system is almost surely asymptotically safe (P1 in Definition II.5) with respect to the origin.

Proof.
\deleted

The proof is similar to the proof in Theorem III.4 and thus is omitted due to space limit. \addedThe proof is provided in Appendix VI. ∎

Theorem III.7.

Suppose the MATI condition in (11) holds and consider the system in (1) with \deletedessentially bounded external disturbance , then the system is \replacedasymptotically bounded in expectation (E2 in Definition II.5)stochastically safe in probability \addedwith respect to a bounded safe set , i.e., , there exists a class function and a class function such that

and .

Proof.

The proof is provided in Appendix VI. ∎

Theorem III.8.

Suppose the hypothesis in Theorem III.7 holds, then the system is stochastically safe in probability (P2 in Definition II.5) with respect to a bounded safe set .

Proof:

The result can be straightforwardly obtained by Markov inequality. ∎

Iv Safety and Efficiency: A Two-player Constrained Cooperative Game

The system efficiency in this paper is defined as an optimization problem where optimal transmission power and control policies are sought to minimize a joint communication and control cost in an infinite horizon. To assure both system efficiency and safety, the control () and communication () policies must be carefully coordinated due to their tight couplings as suggested by the safety condition in (11). This collaboration between communication and control systems can be naturally formulated as a two-player constrained cooperative game where the players’ strategy spaces are constrained and coupled. The equilibrium of the game represents the optimal transmission power and control policies to achieve both system safety and efficiency.

Problem IV.1 (Two-player Constrained Cooperative Game).

Let denote the power cost and denote the control cost for the MDP system, the safety and efficiency problem is to find the optimal control and transmission power policies to the following two-player constrained cooperative game,

(13)
s.t.

where and is the transmission time interval and is a monotonically decreasing function with respect to .

Remark IV.2.

The inequality (13) is a safety constraint reformulated by the sufficient condition (11). In order to see how this safety constraint is derived from (11), let denote the feasible policies such that . Thus

By arranging the inequality, one has

Since , one always has . Thus, for any given control and power policies that satisfy the above inequality, the sufficient condition in (11) assures system safety.

Under the stationary policy space, we show that the two-player constrained cooperative game Problem IV.1 can be solved by solving the following constrained nonlinear optimization problem.

Problem IV.3.

Constrained Nonlinear Optimization Problem: Suppose the state  and action  spaces in the MDP system are finite sets, and transmission power set is finite. Let and where and , denote the decision variables to the following nonlinear constrained optimization problem.

(14a)
subject to
(14b)
(14c)
(14d)

The following Lemma shows that Problems IV.3 and IV.1 are equivalent in the sense that they have the same optimal solutions and objectives.

Lemma IV.4.

Let and denote the optimal solutions to Problem IV.3, then the policies and are the optimal solutions to Problem IV.1.

Proof.

The proof can be obtained by examining the equivalence between Problem IV.3 and Problem IV.1 in terms of objective function, decision variables and feasible set imposed by the constraints. We have already shown that the objective function in Problem IV.1 can be rewritten as a function of the new decision variables and in Problem IV.3. According to the definition of , one has . Thus, the decision variable uniquely defines the control strategy . The constraints in (14d) are introduced to enforce the probability law (i.e. non-negativity and total probability being ). The constraint in (14c) is a reformulation of the Markovian dynamics for the MDP in terms of new decision variables and (see [34] for more details). Therefore, one has established the equivalence and the proof is complete. ∎

Remark IV.5.

Problem IV.3 is a polynomial optimization problem where the objective function and safety constraints in (14b) are polynomial functions. The main challenge to solve this polynomial optimization problem is the fact that the safety constraints are non-convex. The presence of non-convex constraint (14b) in the optimization problem is due to the couplings between communication and control policies in industrial settings with state-dependent fading wireless channels.

Iv-a Relaxed Generalized Geometrical Programming

Problem IV.3 falls into one type of non-convex optimization problem, called Generalized Geometric Program (GGP) [30] where the objective function and constraints are the difference of two posynomials. A posynomial is a function such that where and .

Let denote the decision vector and denote the feasible region for . The constrained optimization Problem IV.3 can be formulated as a GGP as follows,

(15)
subject to

where are posynomials and are linear functions. To see how safety constraints in (14b) can be written as the difference of two posynomials, multiplying both sides of (14b) by leads to

The above GGP can be further reformulated by introducing an exponential transformation, ,

(16)
subject to

where , and

Since is a convex function in terms of , and are convex functions as well. However, the function in the safety constraint is generally not convex [30]. To address the non-convexity issues, this paper approximates the second terms in the non-convex safety constraints using a linear function. The basic idea is illustrated in Figure 1(a) using a simple exponential function. In Figure 1(a), the linear function shown by the solid line upper approximates the exponential function while the linear function shown by the dashed line approximates the exponential function from below. These two functions can be viewed as upper and lower bounds on the exponential function. The following two subsections are devoted to demonstrate how to construct the upper and lower linear functions for a general multivariate exponential function for a given domain.

Iv-A1 Relaxed GGP with Linear Upper Bound

For a given bounded domain with and , one can construct a linear function such that,