Dynamic Team Theory of Stochastic Differential Decision Systems with Decentralized Noiseless Feedback Information Structures via Girsanov’s Measure Transformation

Dynamic Team Theory of Stochastic Differential Decision Systems with Decentralized Noiseless Feedback Information Structures via Girsanov’s Measure Transformation

Charalambos D. Charalambous C.D. Charalambous is with the Department of Electrical and Computer Engineering, University of Cyprus, Nicosia 1678 (E-mail: chadcha@ucy.ac.cy).
Abstract

In this paper we generalized static team theory to dynamic team theory, in the context of stochastic differential decision system with decentralized noiseless feedback information structures.

We apply Girsanov’s theorem to transformed the initial stochastic dynamic team problem to an equivalent team problem, under a reference probability space, with state process and information structures independent of any of the team decisions. Subsequently, we show, under certain conditions, that continuous-time and discrete-time stochastic dynamic team problems, can be transformed to equivalent static team problems, although computing the optimal team strategies using this method might be computational intensive. Therefore, we propose an alternative method, by deriving team and Person-by-Person (PbP) optimality conditions, via the stochastic Pontryagin’s maximum principle, consisting of forward and backward stochastic differential equations, and a set of conditional variational Hamiltonians with respect to the information structures of the team members.

Finally, we relate the backward stochastic differential equation to the value process of the stochastic team problem.

I Introduction

In classical stochastic control or decision theory the control actions or decisions applied by the multiple controllers or Decision Makers (DM) are based on the same information. The underlying assumption is that the acquisition of information is centralized, or the information collected at different observation posts is communicated to each controller or DMs. Classical stochastic control problems are often classified, based on the information available for control actions, into fully observable [1, 2, 3, 4, 5, 6, 7] and partially observable [2, 5, 8]. Fully, observable refers to the case when the information structure or pattern available for control actions is generated by the state process (also called feedback information) or the exogenous state noise process (also called nonanticipative information), and partially observable refers to the case when the information structure available for control actions is a nonlinear function of the state process corrupted by exogenous observation noise process.

In this paper, we deviate from the classical stochastic control or decision formulation by consider a system operating over a finite time period , with the following features.

  1. There are observation posts or stations collecting information;

  2. There are control stations, each having direct access to information collected by at most one observation post, without delay;

  3. The observation stations may not communicate their information to the other control stations, or they may communicate their information to the other control stations by signaling part or all of their information to some of the control stations with delay;

  4. The control stations may not have perfect recall, that is, information which is available at any of the control stations at time may not be available at any future time ;

  5. The control strategies applied at the control stations have to be coordinated to optimize a common pay-off or reward.

In the above formulation we have assumed that one observation post is serving one control station without delay, and we allowed the possibility that a subset of the other observation posts signal their information to any of the control stations they are not serving subject to delay. Such signaling among the observation posts and control stations is called information sharing [9, 10, 11, 12].
The elements of the proposed system of study are the following.

We call as usual the information available as arguments of the control laws, which generate the control actions applied at the control stations, “information Structure or Pattern”.

Suppose, for now, there is no signaling of information from the observation posts to any of the control station they are not serving, and let denote the observation available to the th control station to generate the control actions for . Denote the corresponding control strategies by , for . Given the control strategies, the performance of the collective decisions or control actions applied by the control stations, the stochastic differential decision system is formulated using dynamic team theory, as follows.

(1)
(2)

subject to stochastic Itô differential dynamics and distributed noiseless observations

(3)
(4)

where denotes expectation with respect to an underlying probability space . The stochastic system (3) may be a compact representation of many interconnected subsystems with states , aggregated into a single state representation , where represents the state of the local subsystem, its local distributed observation process collected at the th local observation post, and its local control process applied at the th local control station, such that for each , the control law is , a nonanticipative measurable function of the th control station information structure , for .
We often call the stochastic differential decision system (1)-(4) with decentralized noiseless feedback information structures, , a stochastic dynamic team problem, and a strategy which achieves the infimum in (1) a team optimal strategy.
Moreover, we call a PbP optimal strategy if

(5)

and the infimum subject to constraints (3), (4) is achieved. In team theory terminology are called the DMs, agents or members of the team game.

In this paper, we investigate the stochastic dynamic team problem (1)-(4), and its generalization when, there is information sharing from the observation posts to any of the control stations, and there is no perfect recall of information at the control stations.

Recall that a stochastic team problem is called a “Static Team Problem” if the information structures available for decisions are not affected by any of the team decisions. Optimality conditions for static team problems are developed by Marschak and Radner [13, 14, 15], and subsequently generalized in [16]. Clearly, since the information structures generated by (4) are affected by the team decisions via the state process generated by (3), the static team theory optimality conditions given in [13, 14, 15, 16] donot apply.

On the other hand, stochastic optimal control theory with full information is developed under a centralized assumption on the information structures. Therefore, a natural question is whether any of these techniques developed over the last 60 years for centralized stochastic control problems and dynamic games, such as, dynamic programming, stochastic Pontryagin’s maximum principle, and martingale methods are applicable to stochastic dynamic team problems, and if so how.

In this paper we apply techniques from classical strochastic control theory to generalize Marschak’s and Radner’s static team theory [13, 14, 15] to continuous-time stochastic differential decision systems with decentralized noiseless feedback information structures, defined by (1)-(4). Moreover, we discuss generalizations of (1)-(4), when there is information sharing from the observation posts to any of the control stations, and there is no perfect recall of information at the control stations. Our methodology is based on deriving team and PbP optimality conditions, using stochastic Pontryagin’s maximum principle, by utilizing the semi martingale representation method due to Bismut [3], under a weak formulation of the probability space by invoking Girsanov’s theorem [17]. First, we apply Girsanov’s theorem to transform the original stochastic dynamic team problem to an equivalent team problem, under a reference probability space in which the state process and the information structures are not affected by any the team decisions. Subsequently, we show the precise connection between Girsanov’s measure transformation and Witsenhausen’s notion of “Common Denominator Condition” and “Change of Variables” introduced in [18] to establish equivalence between static and dynamic team problems. We elaborate on this connection for both continuous-time and discrete-time stochastic systems, and we state certain results from static team theory which are directly applicable. However, since the computation of the optimal team strategies via static team theory might be computationally intensive, we proceed further to derive optimality conditions based on stochastic variational methods, by taking advantage of the fact that under the reference measure, the state process and the information structures do not react to any perturbations of the team decisions. The optimality conditions are given by a “Hamiltonian System” consisting of a backward and forward stochastic differential equations, while the optimal team actions of the th team member are determined by a conditional variational Hamiltonian, conditioned on the information structure of the th team member, while the rest are fixed to their optimal values, for . Finally, we show the connection between the backward stochastic differential equation and the value process of the stochastic dynamic team problem.

We point out that the approach we pursued in this paper is different from the various approaches pursued over the years to address stochastic dynamic decentralized decision systems, formulated using team theory in [19, 20, 21, 22, 11, 10, 23, 12, 24, 25, 16, 26, 27, 28, 29, 30, 31, 18, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45], and our recent treatment in [46]. Compared to [46], in the current paper we apply Girsanov’s measure transformation, which allows us to derive the stochastic Pontryagin’s maximum principle, for decentralized noiseless feedback information structures, instead of nonanticipative (open loop) information structures adapted to a sub-filtration of the fixed filtration generated by the Brownian motion (e.g., ) considered in [46]. Since feedback strategies are more desirable compared to nonanticipative strategies, this paper is an improvement to [46]. The only disadvantage is that, unlike [46], we cannot allow dependence of the diffusion coefficient on the team decisions. The current paper also generalizes some of the results on centralized partial information optimality conditions derived in [47] to centralized partial feedback information. We note that the case of decentralized noisy information structures is treated in [48], and therefore by combining the results of this paper with those in [48], we can handle any combination of decentralized noiseless and noisy information structures. However, when applying the optimality conditions to determine the optimal team strategies, the main challenge is the computation of conditional expectations with respect to the information structures. The procedure is similar to [49], where various examples from the communication and control areas are presented, using decentralized nonanticipative strategies.

The rest of the paper is organized as follows. In Section II we introduce the stochastic differential team problem and its equivalent re-formulations using the weak Girsanov measure transformation approach. Here, we also establish the connection between Girsanov’s theorem and Witsenhausen’s “Common Denominator Condition” and “Change of Variables” [18] for stochastic continuous-time and discrete-time dynamical systems. Further, we derive the variational equation which we invoke in Section III, to derive the optimality conditions, both under the reference probability measure and under the initial probability measure. In Section IV we provide concluding remarks and comments on future work.

Ii Equivalent Stochastic Dynamic Team Problems

In this section, we consider the stochastic dynamic team problem (1)-(4), and we apply Girsanov’s theorem, to transformed it to an equivalent team problem under a reference probability measure, in which the information structures are functionals of Brownian motion, and hence independent of any of the team decisions. We will also briefly discuss the discrete-time counterpart of Girsanov’s theorem, and we will show its equivalence to Witsenhausen’s so-called “Common Denominator Condition” and “Change of Variables” discussed in [18].

Let denote a complete filtered probability space satisfying the usual conditions, that is, is complete, contains all -null sets in . Note that filtrations are mononote in the sense that , . Moreover, we assume throghout that filtrations are right continuous, i.e., . We define . Consider a random process taking values in , where is a metric space, defined on the filtered probability space . The process is said to be (a) measurable, if the map is measurable, (b) adapted, if for all , the map is measurable, (c) progressively measurable if for all , the map is measurable. It can be shown that any stochastic process on a filtered probability space which is measurable and adapted has a progressively measurable modification. Unless otherwise specified, we shall say a process is adapted if the processes is progressively measurable [7].

We use the following notation.

: Subset of natural numbers.
: Set consisting of elements.
: Set minus .
: Linear transformation mapping a vector space into a vector space .
: th column of a map .
: Action spaces of controls applied at the th control station, .
TABLE I: Table of Notation

Let denote the space of continuous real-valued dimensional functions defined on the time interval , and its canonical Borel filtration.

Let denote the space of adapted random processes such that

which is a Hilbert subspace of .
Similarly, let denote the space of adapted matrix valued random processes such that

Let denote the space of -adapted valued second order random processes endowed with the norm topology defined by

Next, we introduce conditions on the coefficients , which are partly used to derive the results of this section.

Assumptions 1.

(Main assumptions) The drift , diffusion coefficients , and information functional are Borel measurable maps:

Moreover,

(A0) is nonempty, .

There exists a such that

(A1) ;

(A2) ;

(A3) ;

(A4) is invertible ;

(A5) uniformly in ;

(A6) ;

(A7) ;

(A8) .

Ii-a Equivalent Stochastic Team Problems via Girsanov’s

Next, we define the dynamic team problem (1)-(4) using the weak Girsanov’s change of measure approach.

We start with a canonical space on which are defined by

(WP1) : an -valued Random Variable with distribution ;

(WP2) : an -valued standard Brownian motion, independent of ;

We introduce the Borel algebra on generated by , and let its Wiener mesure on it. Further, we introduce the filtration generated by truncations of . That is, is the sub--algebra generated by the family of sets

(6)

which implies is the canonical Borel filtration, , and are the truncations for . Next, we define

On the probability space we define the stochastic differential equation

(7)

Then by Assumptions 1, (A2), (A3), and for any initial condition satisfying , (7) has a unique strong solution [17], adapted to the filtration , and .
We also introduce the algebra defined by

(8)

Hence, is the canonical Borel filtration generated by satisfying (7). From (7), and the additional Assumptions 1, (A4) on , it can be shown that , and this algebra is independent of any of the team decisions . Unlike [46], where we utilize open loop or nonanticipative information structures, here we use feedback information structures. Note that for the feedback information structures to be independent of any of the team decisions , it is necessary that under the reference probability measure, the state process is independent of , which is indeed the case because we have restricted the class of diffusion coefficients to those which are independent of .

Next we prepare to define three sets of admissible team strategies. We define the Borel algebras generated by projections of on any of its subspaces say, , and the distributed observations process as follows.

(9)
(10)

Further, define , the canonical Borel filtration generated by , for . Define the delayed sharing information structure at the th control station by , which is the minimum filtration generated by the Borel algebra at the th observation post , and the delayed sharing information signaling, , , from the observation posts , to the control station , for .

Next, we define the three classes of information structures we consider in this paper.

Definition 1.

(Noiseless Feedback Admissible Strategies)
Without Signaling: If there is no signaling from the observation posts to any of the other control stations, the set of admissible strategies at the th control station is defined by

(11)

A team strategy is an tuple defined by

With Signaling: If there is delayed sharing information signaling from the other observation posts, the set of admissible strategies at the th control station is defined by

(12)

A team strategy is an defined by

Without Perfect Recall Markov: If the distributed observation process collected at the th observation post is , and there is no perfect recall, the set of admissible strategies at the th control station is defined by

(13)

A team strategy is an defined by

The results derived in this paper hold for other variations of the information structures, such as, control stations without perfect recall based on delayed information , etc.

The reason for imposing the condition will be clarified shortly. Thus, an admissible strategy, say, is a family of functions, say, , which are progressively measurable (nonanticipative) with respect to the delayed sharing noiseless feedback information structure .

Next, for any (we can also consider ) we define on the exponential function

(14)

Under the additional Assumptions 1, (A5), by Itô’s differential rule it is the unique adapted, a.s. continuous solution [17] of the stochastic differential equation

(15)

Given any we define the reward of the team game under by

(16)

where will be such that (16) is finite.
For any admissible strategy , by Assumptions 1, (A5), Novikov condition [17]

(17)

which is sufficient for defined by (14) to be an -martingale, . Thus, by the martingale property, has constant expectation, , and therefore, we can utilize which represents a version of the Radon-Nikodym derivative, to define a probability measure on by setting

(18)

Moreover, by Girsanov’s theorem under the probability space , the process is a standard Brownian motion and it is defined by

(19)

, and the distribution of is unchanged.
Therefore, under Assumptions 1, (A1)-(A5) we have constructed the probability space , the Brownian motion defined on it, and the state process which is a weak solution of

(20)

, unique in probability law defined via (18), having the properies , it is adapted, and .
By substituting (18) into (16), under the probability measure , the team game reward is given by

(21)

From the definition of the Radon-Nikodym derivative (18), for any admissible strategy, say, we also have .

Note that if we start with the stochastic dynamic team problem (20), (21), the reverse change of measure is obtained as follows. Define

(22)

Then and Consequently, under the reference measure the stochastic dynamic team problem is (7), (15), (16).

Remark 1.

We have shown that under the Assumptions 1, (A1)-(A5), and , for any then , and that we have two equivalent formulations of the stochastic dynamic team problem.

(1) Under the original probability space the dynamic team problem is described by the pay-off (21), and the adapted continuous strong solution satisfying (20), where the distributed observations collected at the observation posts are affected by the team decisions via .

(2) Under the reference probability space the dynamic team problem is described by the pay-off (16), and the adapted continuous pathwise solution of , satisfying (7), (15), where is not affected by any of the team decisions. Note that strong uniqueness holds for solutions of (7), (15) because both satisfy the Lipschitz conditions (i.e. Assumptions 1, (A3), (A5) hold).

Remark 2.

The Assumptions 1, (A5) is satisfied if the following alternative conditions hold.

(A5)(a) (A4), (A6) holds and either (i) (A1) is replaced by