Rationally Inattentive Control
of Markov Processes^{†}^{†}thanks: This
work was supported in part by the NSF under award nos. CCF1254041, CCF1302438, ECCS1135598, by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCF0939370, and in part by the UIUC
College of Engineering under Strategic Research Initiative on “Cognitive
and Algorithmic Decision Making.” The material in this paper was presented in part at the 2013 American Control Conference and at the 2013 IEEE Conference on Decision and Control.
Abstract
The article poses a general model for optimal control subject to information constraints, motivated in part by recent work of Sims and others on informationconstrained decisionmaking by economic agents. In the averagecost optimal control framework, the general model introduced in this paper reduces to a variant of the linearprogramming representation of the averagecost optimal control problem, subject to an additional mutual information constraint on the randomized stationary policy. The resulting optimization problem is convex and admits a decomposition based on the Bellman error, which is the object of study in approximate dynamic programming. The theory is illustrated through the example of informationconstrained linearquadraticGaussian (LQG) control problem. Some results on the infinitehorizon discountedcost criterion are also presented.
tochastic control, information theory, observation channels, optimization, Markov decision processes
94A34, 90C40, 90C47
1 Introduction
The problem of optimization with imperfect information [5] deals with situations where a decision maker (DM) does not have direct access to the exact value of a payoffrelevant variable. Instead, the DM receives a noisy signal pertaining to this variable and makes decisions conditionally on that signal.
It is usually assumed that the observation channel that delivers the signal is fixed a priori. In this paper, we do away with this assumption and investigate a class of dynamic optimization problems, in which the DM is free to choose the observation channel from a certain convex set. This formulation is inspired by the framework of Rational Inattention, proposed by the wellknown economist Christopher Sims^{1}^{1}1Christopher Sims has shared the 2011 Nobel Memorial Prize in Economics with Thomas Sargent. to model decisionmaking by agents who minimize expected cost given available information (hence “rational”), but are capable of handling only a limited amount of information (hence “inattention”) [28, 29]. Quantitatively, this limitation is stated as an upper bound on the mutual information in the sense of Shannon [25] between the state of the system and the signal available to the DM.
Our goal in this paper is to initiate the development of a general theory of optimal control subject to mutual information constraints. We focus on the averagecost optimal control problem for Markov processes and show that the construction of an optimal informationconstrained control law reduces to a variant of the linearprogramming representation of the averagecost optimal control problem, subject to an additional mutual information constraint on the randomized stationary policy. The resulting optimization problem is convex and admits a decomposition in terms of the Bellman error, which is the object of study in approximate dynamic programming [22, 5]. This decomposition reveals a fundamental connection between informationconstrained controller design and ratedistortion theory [4], a branch of information theory that deals with optimal compression of data subject to information constraints.
Let us give a brief informal sketch of the problem formulation; precise definitions and regularity/measurability assumptions are spelled out in the sequel. Let , , and denote the state, the control (or action), and the observation spaces. The objective of the DM is to control a discretetime state process with values in by means of a randomized control law (or policy) , , which generates a random action conditionally on the observation . The observation , in turn, depends stochastically on the current state according to an observation model (or information structure) . Given the current action and the current state , the next state is determined by the state transition law . Given a onestep stateaction cost function and the initial state distribution , the pathwise longterm average cost of any pair consisting of a policy and an observation model is given by
where the law of the process is induced by the pair and by the law of ; for notational convenience, we will suppress the dependence on the fixed state transition dynamics .
If the information structure is fixed, then we have a Partially Observable Markov Decision Process, where the objective of the DM is to pick a policy to minimize . In the framework of rational inattention, however, the DM is also allowed to optimize the choice of the information structure subject to a mutual information constraint. Thus, the DM faces the following optimization problem:^{2}^{2}2Since is a random variable that depends on the entire path , the definition of a minimizing pair requires some care. The details are spelled out in Section 3.
minimize  (1a)  
subject to  (1b) 
where denotes the Shannon mutual information between the state and the observation at time , and is a given constraint value. The mutual information quantifies the amount of statistical dependence between and ; in particular, it is equal to zero if and only if and are independent, so the limit corresponds to openloop policies. If , then the act of generating the observation will in general involve loss of information about the state (the case of perfect information corresponds to taking ). However, for a given value of , the DM is allowed to optimize the observation model and the control law jointly to make the best use of all available information. In light of this, it is also reasonable to grant the DM the freedom to optimize the choice of the observation space , i.e., to choose the optimal representation for the data supplied to the controller. In fact, it is precisely this additional freedom that enables the reduction of the rationally inattentive optimal control problem to an infinitedimensional convex program.
This paper addresses the following problems: (a) give existence results for optimal informationconstrained control policies; (b) describe the structure of such policies; and (c) derive an informationconstrained analogue of the AverageCost Optimality Equation (ACOE). Items (a) and (b) are covered by Theorem 5.2, whereas Item (c) is covered by Theorem 5.2 and subsequent discussion in Section 5.3. We will illustrate the general theory through the specific example of an informationconstrained Linear Quadratic Gaussian (LQG) control problem. Finally, we will outline an extension of our approach to the more difficult infinitehorizon discountedcost case.
1.1 Relevant literature
In the economics literature, the rational inattention model has been used to explain certain memory effects in different economic equilibria [30], to model various situations such as portfolio selection [16] or Bayesian learning [24], and to address some puzzles in macroeconomics and finance [35, 36, 19]. However, most of these results rely on heuristic considerations or on simplifying assumptions pertaining to the structure of observation channels.
On the other hand, dynamic optimization problems where the DM observes the system state through an informationlimited channel have been long studied by control theorists (a very partial list of references is [37, 1, 3, 33, 34, 6, 42]). Most of this literature focuses on the case when the channel is fixed, and the controller must be supplemented by a suitable encoder/decoder pair respecting the information constraint and any considerations of causality and delay. Notable exceptions include classic results of Bansal and Başar [1, 3] and recent work of Yüksel and Linder [42]. The former is concerned with a linearquadraticGaussian (LQG) control problem, where the DM must jointly optimize a linear observation channel and a control law to minimize expected stateaction cost, while satisfying an average power constraint; informationtheoretic ideas are used to simplify the problem by introducing a certain sufficient statistic. The latter considers a general problem of selecting optimal observation channels in static and dynamic stochastic control problems, but focuses mainly on abstract structural results pertaining to existence of optimal channels and to continuity of the optimal cost in various topologies on the space of observation channels.
The paper is organized as follows: The next section introduces the notation and the necessary informationtheoretic preliminaries. Problem formulation is given in Section 3, followed by a brief exposition of ratedistortion theory in Section 4. In Section 5, we present our analysis of the problem via a synthesis of ratedistortion theory and the convexanalytic approach to Markov decision processes (see, e.g., [8]). We apply the theory to an informationconstrained variant of the LQG control problem in Section 6. All of these results pertain to the averagecost criterion; the more difficult infinitehorizon discountedcost criterion is considered in Section 7. Certain technical and auxiliary results are relegated to Appendices.
2 Preliminaries and notation
All spaces are assumed to be standard Borel (i.e., isomorphic to a Borel subset of a complete separable metric space); any such space will be equipped with its Borel field . We will repeatedly use standard notions results from probability theory, as briefly listed below; we refer the reader to the text by Kallenberg [17] for details. The space of all probability measures on will be denoted by ; the sets of all measurable functions and all bounded continuous functions will be denoted by and by , respectively. We use the standard linearfunctional notation for expectations: given an valued random object with and ,
A Markov (or stochastic) kernel with input space and output space is a mapping , such that for all and for every . We denote the space of all such kernels by . Any acts on from the left and on from the right:
Note that for any , and for any . Given a probability measure , and a Markov kernel , we denote by a probability measure defined on the product space via its action on the rectangles , :
If we let in the above definition, then we end up with with . Note that product measures , where , arise as a special case of this construction, since any can be realized as a Markov kernel .
We also need some notions from information theory. The relative entropy (or information divergence) [25] between any two probability measures is
where denotes absolute continuity of measures, and is the Radon–Nikodym derivative. It is always nonnegative, and is equal to zero if and only if . The Shannon mutual information [25] in is
(2) 
The functional is concave in , convex in , and weakly lower semicontinuous in the joint law : for any two sequences and such that weakly, we have
(3) 
(indeed, if converges to weakly, then, by considering test functions in and , we see that and weakly as well; Eq. (3) then follows from the fact that the relative entropy is weakly lowersemicontinuous in both of its arguments [25]). If is a pair of random objects with , then we will also write or for . In this paper, we use natural logarithms, so mutual information is measured in nats. The mutual information admits the following variational representation [32]:
(4) 
where the infimum is achieved by . It also satisfies an important relation known as the data processing inequality: Let be a triple of jointly distributed random objects, such that and are conditionally independent given . Then
(5) 
In words, no additional processing can increase information.
3 Problem formulation and simplification
We now give a more precise formulation for the problem (1) and take several simplifying steps towards its solution. We consider a model with a block diagram shown in Figure 1, where the DM is constrained to observe the state of the controlled system through an informationlimited channel. The model is fully specified by the following ingredients:

the state, observation and control spaces denoted by , and respectively;

the (timeinvariant) controlled system, specified by a stochastic kernel that describes the dynamics of the system state, initially distributed according to ;

the observation channel, specified by a stochastic kernel ;

the feedback controller, specified by a stochastic kernel .
The valued state process , the valued observation process , and the valued control process are realized on the canonical path space , where , is the Borel field of , and for every
with . The process distribution satisfies , and
Here and elsewhere, denotes the tuple ; the same applies to , , etc. This specification ensures that, for each , the next state is conditionally independent of given (which is the usual case of a controlled Markov process), that the control is conditionally independent of given , and that the observation is conditionally independent of given the most recent state . In other words, at each time the controller takes as input only the most recent observation , which amounts to the assumption that there is a separation structure between the observation channel and the controller. This assumption is common in the literature[37, 33, 34]. We also assume that the observation depends only on the current state ; this assumption appears to be rather restrictive, but, as we show in Appendix A, it entails no loss of generality under the above separation structure assumption.
We now return to the informationconstrained control problem stated in Eq. (1). If we fix the observation space , then the problem of finding an optimal pair is difficult even in the singlestage case. Indeed, if we fix , then the Bayesoptimal choice of the control law is to minimize the expected posterior cost:
Thus, the problem of finding the optimal reduces to minimizing the functional
over the convex set . However, this functional is concave, since it is given by a pointwise infimum of affine functionals. Hence, the problem of jointly optimizing for a fixed observation space is nonconvex even in the simplest singlestage setting. This lack of convexity is common in control problems with “nonclassical” information structures [18].
Now, from the viewpoint of rational inattention, the objective of the DM is to make the best possible use of all available information subject only to the mutual information constraint. From this perspective, fixing the observation space could be interpreted as suboptimal. Indeed, we now show that if we allow the DM an additional freedom to choose , and not just the information structure , then we may simplify the problem by collapsing the three decisions of choosing into one of choosing a Markov randomized stationary (MRS) control law satisfying the information constraint , where is the distribution of the state at time , and denotes the process distribution of , under which , , and . Indeed, fix an arbitrary triple , such that the information constraint (1b) is satisfied w.r.t. :
(6) 
Now consider a new triple with , , and , where is the Dirac measure centered at . Then obviously is the same in both cases, so that . On the other hand, from (6) and from the data processing inequality (5) we get
so the information constraint is still satisfied. Conceptually, this reduction describes a DM who receives perfect information about the state , but must discard some of this information “along the way” to satisfy the information constraint.
In light of the foregoing observations, from now on we let and focus on the following informationconstrained optimal control problem:
minimize  (7a)  
subject to  (7b) 
Here, the limit supremum in (7a) is a random variable that depends on the entire path , and the precise meaning of the minimization problem in (7a) is as follows: We say that an MRS control law satisfying the information constraint (7b) is optimal for (7a) if
(8) 
where
(9) 
is the longterm expected average cost of MRS with initial state distribution , and where the infimum on the righthand side of Eq. (8) is over all MRS control laws satisfying the information constraint (7b) (see, e.g., [14, p. 116] for the definition of pathwise averagecost optimality in the informationunconstrained setting). However, we will see that, under general conditions, is deterministic and independent of the initial condition.
4 Onestage problem: solution via ratedistortion theory
Before we analyze the averagecost problem (7), we show that the onestage case can be solved completely using ratedistortion theory [4] (a branch of information theory that deals with optimal compression of data subject to information constraints). Then, in the following section, we will tackle (7) by reducing it to a suitable onestage problem.
With this in mind, we consider the following problem:
minimize  (10a)  
subject to  (10b) 
for a given probability measure and a given , where
(11) 
The set is nonempty for every . To see this, note that any kernel for which the function is constant (a.e. for any ) satisfies . Moreover, this set is convex since the functional is convex for any fixed . Thus, the optimization problem (10) is convex, and its value is called the Shannon distortionrate function (DRF) of :
(12) 
In order to study the existence and the structure of a control law that achieves the infimum in (12), it is convenient to introduce the Lagrangian relaxation
From the variational formula (4) and the definition (12) of the DRF it follows that
Then we have the following key result [10]:
The DRF is convex and nonincreasing in . Moreover, assume the following:

The cost function is lower semicontinuous, satisfies
and is also coercive: there exist two sequences of compact sets and such that

There exists some such that .
Define the critical rate
(it may take the value ). Then, for any there exists a Markov kernel satisfying and . Moreover, the Radon–Nikodym derivative of the joint law w.r.t. the product of its marginals satisfies
(13) 
where and are such that
(14) 
and is the slope of a line tangent to the graph of at :
(15) 
For any , there exists a Markov kernel satisfying
and . This Markov kernel is deterministic, and is implemented by , where is any minimizer of over .
Upon substituting (13) back into (12) and using (14) and (15), we get the following variational representation of the DRF:
Under the conditions of Prop. 4, the DRF can be expressed as
5 Convex analytic approach for averagecost optimal control with rational inattention
We now turn to the analysis of the averagecost control problem (7a) with the information constraint (7b). In multistage control problems, such as this one, the control law has a dual effect [2]: it affects both the cost at the current stage and the uncertainty about the state at future stages. The presence of the mutual information constraint (7b) enhances this dual effect, since it prevents the DM from ever learning “too much” about the state. This, in turn, limits the DM’s future ability to keep the average cost low. These considerations suggest that, in order to bring ratedistortion theory to bear on the problem (7a), we cannot use the onestage cost as the distortion function. Instead, we must modify it to account for the effect of the control action on future costs. As we will see, this modification leads to a certain stochastic generalization of the Bellman Equation.
5.1 Reduction to singlestage optimization
We begin by reducing the dynamic optimization problem (7) to a particular static (singlestage) problem. Once this has been carried out, we will be able to take advantage of the results of Section 4. The reduction is based on the socalled convexanalytic approach to controlled Markov processes [8] (see also [20, 7, 13, 22]), which we briefly summarize here.
Suppose that we have a Markov control problem with initial state distribution and controlled transition kernel . Any MRS control law induces a transition kernel on the state space :
We wish to find an MRS control law that would minimize the longterm average cost simultaneously for all . With that in mind, let
where is the longterm expected average cost defined in Eq. (9). Under certain regularity conditions, we can guarantee the existence of an MRS control law , such that a.s. for all . Moreover, this optimizing control law is stable in the following sense:
An MRS control law is called stable if:

There exists at least one probability measure , which is invariant w.r.t. : .

The average cost is finite, and moreover
The subset of consisting of all such stable control laws will be denoted by .
Then we have the following [14, Thm. 5.7.9]:
Suppose that the following assumptions are satisfied:

The cost function is nonnegative, lower semicontinuous, and coercive.

The cost function is infcompact, i.e., for every and every , the set is compact.

The kernel is weakly continuous, i.e., for any .

There exist an MRS control law and an initial state , such that .
Then there exists a control law , such that
(16) 
where . Moreover, if is such that the induced kernel is Harrisrecurrent, then a.s. for all .
One important consequence of the above theorem is that, if achieves the infimum on the rightmost side of (16) and if is the unique invariant distribution of the Harrisrecurrent Markov kernel , then the state distributions induced by converge weakly to regardless of the initial condition . Moreover, the theorem allows us to focus on the static optimization problem given by the righthand side of Eq. (16).
Our next step is to introduce a steadystate form of the information constraint (7b) and then to use ideas from ratedistortion theory to attack the resulting optimization problem. The main obstacle to direct application of the results from Section 4 is that the state distribution and the control policy in (16) are coupled through the invariance condition . However, as we show next, it is possible to decouple the information and the invariance constraints by introducing a functionvalued Lagrange multiplier to take care of the latter.
5.2 Bellman error minimization via marginal decomposition
We begin by decomposing the infimum over in (16) by first fixing the marginal state distribution . To that end, for a given , we consider the set of all stable control laws that leave it invariant (this set might very well be empty): . In addition, for a given value of the information constraint, we consider the set (recall Eq. (11)).
Assuming that the conditions of Theorem 5.1 are satisfied, we can rewrite the expected ergodic cost (16) (in the absence of information constraints) as
(17) 
In the same spirit, we can now introduce the following steadystate form of the informationconstrained control problem (7):
(18) 
where the feasible set accounts for both the invariance constraint and the information constraint.
As a first step to understanding solutions to (18), we consider each candidate invariant distribution separately and define
(19) 
(we set the infimum to if ). Now we follow the usual route in the theory of averagecost optimal control [22, Ch. 9] and eliminate the invariance condition by introducing a functionvalued Lagrange multiplier:
For any ,
(20) 
Remark 1
Remark 2
Upon setting , we can recognize the function as the Bellman error associated with ; this object plays a central role in approximate dynamic programming.
Let take the value if and otherwise. Then
(21) 
Moreover,
(22) 
Indeed, if , then the righthand side of (22) is zero. On the other hand, suppose that . Since is standard Borel, any two probability measures are equal if and only if for all . Consequently, for some . There is no loss of generality if we assume that . Then by considering functions for all and taking the limit as , we can make the righthand side of (22) grow without bound. This proves (22). Substituting it into (21), we get (20).
Armed with this proposition, we can express (18) in the form of an appropriate ratedistortion problem by fixing and considering the dual value for (20):
(23) 
Suppose that assumption (A.1) above is satisfied, and that . Then the primal value and the dual value are equal.
Let be the closure, in the weak topology, of the set of all , such that , , and . Since by hypothesis, we can write
(24) 
and
(25) 
Because is coercive and nonnegative, and , the set is tight [15, Proposition 1.4.15], so its closure is weakly sequentially compact by Prohorov’s theorem. Moreover, because the function is weakly lower semicontinuous [25], the set is closed. Therefore, the set is closed and tight, hence weakly sequentially compact. Moreover, the sets and are both convex, and the objective function on the righthand side of (24) is affine in and linear in . Therefore, by Sion’s minimax theorem [31] we may interchange the supremum and the infimum to conclude that .
We are now in a position to relate the optimal value to a suitable ratedistortion problem. Recalling the definition in Eq. (12), for any we consider the DRF of w.r.t. the distortion function :
(26) 
We can now give the following structural result:
Suppose that Assumptions (A.1)–(A.3) of Theorem 5.1 are in force. Consider a probability measure such that , and the supremum over in (23) is attained by some . Define the critical rate
If , then there exists an MRS control law such that , and the Radon–Nikodym derivative of w.r.t. takes the form
(27) 
where , and satisfies
(28) 
If , then the deterministic Markov policy , where is any minimizer of over , satisfies . In both cases, we have
(29) 
Moreover, the optimal value admits the following variational representation:
(30) 
Using Proposition 5.2 and the definition (23) of the dual value , we can express as a pointwise supremum of a family of DRF’s:
(31) 
Since , we can apply Proposition 4 separately for each . Since is weakly continuous by hypothesis, for any . In light of these observations, and owing to our hypotheses, we can ensure that Assumptions (D.1) and (D.2) of Proposition 4 are satisfied. In particular, we can take that achieves the supremum in (31) (such an exists by hypothesis) to deduce the existence of an MRS control law that satisfies the information constraint with equality and achieves (29). Using (13) with
we obtain (27). In the same way, (28) follows from (15) in Proposition 4. Finally, the variational formula (5.2) for the optimal value can be obtained immediately from (31) and Proposition 4.
Note that the control law characterized by Theorem 5.2 is not guaranteed to be feasible (let alone optimal) for the optimization problem in Eq. (19). However, if we add the invariance condition , then (29) provides a sufficient condition for optimality: {theorem} Fix a candidate invariant distribution . Suppose there exist , , and a stochastic kernel such that
(32) 
Then achieves the infimum in (19), and .