Input design for Bayesian identification of non-linear state-space models

Input design for Bayesian identification of non-linear state-space models

[    [    [    [    [ Department of Chemical and Materials Engineering, University of Alberta, Edmonton AB T6G-2G6, Canada (e-mail: {tulsyan; khare; biao.huang; fraser.forbes}@ ualberta.ca) Department of Chemical and Biological Engineering, University of British Columbia, Vancouver BC V6T-1Z3, Canada (e-mail: gopaluni@chbe.ubc.ca)
Abstract

We propose an algorithm for designing optimal inputs for on-line Bayesian identification of stochastic non-linear state-space models. The proposed method relies on minimization of the posterior Cramér Rao lower bound derived for the model parameters, with respect to the input sequence. To render the optimization problem computationally tractable, the inputs are parametrized as a multi-dimensional Markov chain in the input space. The proposed approach is illustrated through a simulation example.

First]Aditya Tulsyan, First]Swanand R. Khare, First]Biao Huang, Second]R. Bhushan Gopaluni, First]J. Fraser Forbes thanks: [footnote]This article has been published in: Tulsyan, A, S.R. Khare, B. Huang, R.B. Gopaluni and J.F. Forbes (2013). Bayesian identification of non-linear state-space models: Part I- Input design. In: Proceedings of the 10th IFAC International Symposium on Dynamics and Control of Process Systems. Mumbai, India. thanks: [footnote]This work was supported by the Natural Sciences and Engineering Research Council (NSERC), Canada.

1 Introduction

Over the last decade, great progress has been made within the statistics community in overcoming the computational issues, and making Bayesian identification tractable for a wide range of complicated models arising in demographic and population studies, image processing, and drug response modelling (Gilks et al. (1995)). A detailed exposition of Bayesian identification methods can be found in Kantas et al. (2009). This paper is directed towards the class of on-line methods for Bayesian identification of stochastic non-linear SSMs, the procedure for which is briefly introduced here first. Let and be and valued stochastic processes, and let be the sequence of inputs in , such that the state is an unobserved or unmeasured process, with initial density and transition density :

 X0∼pθ(x0);Xt+1|(xt,ut)∼pθ(xt+1|xt,ut)  (t∈N). (1)

is an unobserved process, but is observed through , such that is conditionally independent given , with marginal density :

 Yt|(xt,ut)∼pθ(yt|xt,ut)(t∈N). (2)

in (1) and (2) is a vector of unknown model parameters, such that is an open subset of . All the densities are with respect to suitable dominating measures, such as Lebesgue measure. Although (1) and (2) represent a wide class of non-linear time-series models, the model form and the assumptions considered in this paper are given below

 Xt+1=ft(Xt,ut,θt,Vt);Yt=gt(Xt,ut,θt,Wt), (3)

where is a vector of static parameters. {assum} and are mutually independent sequences of independent random variables known a priori in their distribution classes (e.g., Gaussian) and parametrized by a known and finite number of moments. {assum} are such that in the open sets and , is and , respectively, and in , is , and in and , is , and is , where . {assum} For any random sample and satisfying (3), and have rank and , respectively, such that using implicit function theorem, and do not involve any Dirac delta functions.

For a generic sequence , let . Let be the true, but unknown parameter vector generating a measurement sequence given , such that and . In Bayesian identification of (3), the problem of estimating the parameter vector in real-time, given a sequence of input-output data is formulated as a joint state and parameter estimation problem. This is done by ascribing a prior density , such that , and computing , where: is a valued extended Markov process with and . The inference on then relies on the marginal posterior . Note that by a judicious choice of the input sequence , can be ‘steered’ in order to yield , which gives more accurate inference on . This is called the input design problem for Bayesian identification or simply, the Bayesian input design problem. A detailed review on this subject can be found in Chaloner and Verdinelli (1995).

Bayesian input design for linear and non-linear regression models is an active area of research (see Huan and Marzouk (2012), Kück et al. (2006), Müller and Parmigiani (1995) and references cited therein); however, its extension to SSMs has been limited. Recently, Bayesian input design procedure for non-linear SSMs, where is completely observed was developed by Tulsyan et al. (2012). Despite the success with regression models, to the best of authors’ knowledge, no known Bayesian input design methods are available for identification of stochastic non-linear SSMs. This is due to the unobserved state process , which makes the design problem difficult to solve.

This paper deals with the Bayesian input design for identification of stochastic SSMs given in (3). The proposed method is based on minimization of the posterior Cramér-Rao lower bound (PCRLB), derived by Tichavský et al. (1998). First, we use Monte-Carlo (MC) methods to obtain an approximation of the PCRLB, and then parametrize the inputs as a multi-dimensional Markov chain in , to render the optimization problem computationally tractable. Markov-chain parametrization not only allows to include amplitude constraints on the input, it can be easily implemented using a standard PID controller or any other regulator. The notation used here is given next.

Notation: ; ; is the set of real-valued matrices of cardinality ; is the space of symmetric matrices; is the cone of symmetric positive semi-definite matrices in ; and is its interior. The partial order on induced by and are denoted by and , respectively. is the set of stochastic matrix, where and the sum of each row adds up to . For , denotes its trace. For vectors , , and , denotes element-wise inequality, and is a diagonal matrix with elements of as its diagonal entries. Finally, is a Laplacian and is a gradient.

2 Problem formulation

Bayesian input design for regression models is a well studied problem in statistics (Chaloner and Verdinelli (1995)); wherein, the problem is often formulated as follows

 ψ(u⋆1:N)=maxu1:N∈RpNN∑t=1Ep(θt,y1:t|u1:t)[ψ(Y1:t,u1:t,θt)] (4)

where is an -step ahead optimal input sequence, and is a utility function. When inference on is of interest, Lindley (1956) suggested using the mean-square error (MSE) as a utility function, such that

 ψ(u⋆1:N)=maxu1:N∈RpNN∑t=1−Φ(Pθt|t(u1:t)), (5)

where is the MSE associated with the parameter estimate given by , and is a test function. {rem} For the model considered in (3), the marginal posterior density , or the expectation with respect to it, does not admit any analytical solution, and thus, (5) cannot be computed in closed form.           \qed {rem} Methods such as SMC and MCMC can be used to approximate ; however, it makes the computation in (5) formidable (Kück et al. (2006)). Moreover, the input is optimal only for the Bayesian estimator used to approximate       \qed To address the issues in Remarks 2 and 2, we propose to define a lower bound on the MSE first, and minimize the lower bound instead. The PCRLB, derived by Tichavský et al. (1998) provides a lower bound on the MSE associated with the estimation of from , and is given in the next lemma. {lem} Let be an output sequence generated from (3) using , then the MSE associated with the estimation of from is bounded from below by the following matrix inequality

 Pzt|t≜Ep(zt,y1:t|u1:t)[(Zt−Zt|t)(Zt−Zt|t)T]≽[Jzt]−1, (6)

where: is an estimate of ; , , are the MSE, posterior information matrix (PIM), and PCRLB, respectively. {pf} See Tichavský et al. (1998) for proof.                \qed {lem} A recursive approach to compute for (3) under Assumptions 1 through 1 is given as follows

 Jxt+1 =H33t−(H13t)T[Jxt+H11t]−1H13t; (7a) Jxθt+1 =(H23t)T−(H13t)T[Jxt+H11t]−1(Jxθt+H12t); (7b) Jθt+1 =Jθt+H22t−(Jxθt+H12t)T[Jxt+H11t]−1 ×(Jxθt+H12t), (7c)

where:

 H11t =E~pt+1[−ΔXtXtlogpt]; (8a) H12t =E~pt+1[−ΔθtXtlogpt]; (8b) H13t =E~pt+1[−ΔXt+1Xtlogpt]; (8c) H22t =E~pt+1[−Δθtθtlogpt]; (8d) H23t =E~pt+1[−ΔXt+1θtlogpt]; (8e) H33t =E~pt+1[−ΔXt+1Xt+1logpt]; (8f)

, and ; and . {pf} See Tichavský et al. (1998) for proof.      \qed {cor} Let , be such that they satisfy (6), then the MSE associated with the point estimation of , computed from , is bounded from below by the following matrix inequality

 Pθt|t=Ep(θt,y1:t|u1:t)[(θt−θt|t)(θt−θt|t)T]≽Lθt, (9)

where is the lower-right sub-matrix of in (6). {pf} The proof is based on the fact that the PCRLB inequality in (6) guarantees that .    \qed {thm} Let be the PIM for model in (3) and be the lower bound on the MSE associated with the estimation of in (3), then given , the lower bound at can be computed as

 Lθt=[Jθt−(Jxθt)T(Jxt)−1Jxθt]−1, (10)

where , and are the PIMs given in Lemma 2. {pf} The proof is based on matrix inversion lemma.\qed Finally, the input design problem for Bayesian identification of in (3) can be formulated as follows

 ψ(u⋆1:N)= minu1:N∈RpNN∑t=1Φ(Lθt(u1:t)) (11a) s.t.   umin≤{ui}t∈[1,N]≤umax, (11b)

where ; and and are the maximum and minimum magnitude of the input. {rem} The optimization problem in (11) allows to impose magnitude constraints on the inputs. Although constraints on and are not included, but if required, they can also be appended.        \qed {rem} Integral in (8), with respect to , makes (11) independent of the random realizations from , , and . The optimization in (11) in fact only depends on: the process dynamics represented in (3); noise densities and ; and the choice of and . This makes (11) independent of or the Bayesian estimator used for estimating .   \qed {rem} The formulation in (11) yields a sequence , which is (a) optimal for all the Bayesian identification methods that approximate ; and (b) independent of (see Remark 2), such that the input is optimal for all .\qed There are two challenges that need to be addressed in order to make the optimization problem in (11) tractable: (a) computing the lower bound ; and (b) solving the high-dimensional optimization problem in . Our approach to address the above challenges is discussed next.

3 Computing the lower bound

The first challenge is to compute the lower bound in (11). It is well known that computing in closed form is non-trivial for the model form considered in (3) (see Tichavský et al. (1998), Bergman (2001)). This is because of the complex, high-dimensional integrals in (8a) through (8f), which do not admit any analytical solution.

MC sampling is a popular numerical method to solve integrals of the form , where . Using i.i.d. trajectories , the probability distribution , can be approximated as

 ~p(dx0:t|u1:t)=1MM∑i=1δXi0:t|u1:t(dx0:t), (12)

where is a MC estimate of and is the Dirac delta mass at . Finally, substituting (12) into , we get , where is an -sample MC estimate of . {rem} Using MC methods, the multi-dimensional integrals in (8a) through (8f), with respect to the density can be approximated by simulating i.i.d. sample paths using (3), starting at i.i.d. initial positions drawn from . {exmp} Consider the following stochastic SSM with additive Gaussian state and measurement noise

 Xt+1 =ft(Xt,θt,ut)+Vt, (13a) Yt =gt(Xt,θt,ut)+Wt, (13b)

where and are mutually independent sequences of independent zero mean Gaussian random variables, such that and , where and for all . Note that for the model form considered in Example 3, using the Markov property of the states and conditional independence of the measurements, the dimension of the integrals in (8a) through (8f) can be reduced, as given next. {thm} For a stochastic non-linear SSM in Example 3, the matrices in (8a) through (8f) can be written as

 H11t =Ep(xt,θt|u1:t+1)[∇XtfTt(Xt,θt,ut)]Q−1t ×[∇XtfTt(Xt,θt,ut)]T; (14a) H12t =Ep(xt,θt|u1:t+1)[∇XtfTt(Xt,θt,ut)]Q−1t ×[∇θtfTt(Xt,θt,ut)]T; (14b) H13t =−Ep(xt,θt|u1:t+1)[∇XtfTt(Xt,θt,ut)]Q−1t; (14c) H22t =Ep(xt,θt|u1:t+1)[∇θtfTt(Xt,θt,ut)]Q−1t ×[∇θtfTt(Xt,θt,ut)]T +Ep(xt+1,θt|u1:t+1)[∇θtgTt(Xt+1,θt,ut+1)]R−1t+1 ×[∇θtgTt(Xt+1,θt,ut+1)]T (14d) H23t =−Ep(xt,θt|u1:t+1)[∇θtfTt(Xt,θt,ut)]Q−1t +Ep(xt+1,θt|u1:t+1)[∇θtgTt(Xt+1,θt,ut+1)]R−1t+1 ×[∇Xt+1gTt(Xt+1,θt,ut+1)]T (14e) H33t =Q−1t+Ep(xt+1,θt|u1:t+1)[∇Xt+1gTt(Xt+1,θt,ut+1)] ×R−1t+1[∇Xt+1gTt(Xt+1,θt,ut+1)]T (14f)
{pf}

(14a): First note that (see Tichavský et al. (1998)). On simplifying, we have .
This is due to . For Example 3, . Substituting it into , and using , we have (14a). Note that the expression in (14b) through (14f) can be similarly derived.       \qed Theorem 3 reduces the dimension of the integral in (8) for Example 3 from to . Using MC sampling, (14a), for instance, can be computed as . Here and is an -sample MC estimate of . Note that the MC estimates of (14b) through (14f) can be similarly computed. In general, substituting the MC estimates of (8a) through (8f) first into Lemma 2, and then into Theorem 2, yields

 ~Lθt=[~Jθt−(~Jxθt)T(~Jxt)−1~Jxθt]−1, (15)

where is an estimate of , and , and are the estimates of the PIMs in Lemma 2. Finally, substituting (15) into (11) gives the following optimization problem

 ~ψ(u⋆1:N)= minu1:N∈RpNN∑t=1Φ(~Lθt(u1:t)) (16a) s.t.   umin≤{ui}t∈[1,N]≤umax. (16b)
{thm}

Let and be the optimal utility functions, computed by solving the optimization problem in (11) and (16), respectively, then we have

 ~ψ(u⋆1:N)a.s.−−−−−→M→+∞ψ(u⋆1:N), (17)

where denotes almost sure convergence. {pf} Since (15) is based on perfect MC sampling, using the strong law of large numbers, we have as . Equation (17) follows from this result, which completes the proof.                                                \qed A natural approach to solve (16) is to treat as a vector of continuous variables in ; however, this will render (16) computationally inefficient for large . A relaxation method to make (16) tractable is given next.

4 Input parametrization

To overcome the complications due to continuous valued input , we discretize the input space from to , such that , where , and is the number of discrete values for each input in . If we denote , then , for all , such that (16) can be written as follows

 ~ψ(u⋆1:N)=minu1:N∈UNN∑t=1Φ(~Lθt(u1:t)). (18)

Note that although the input in (18) is defined on a discrete input space of , (18) is still intractable for large . To address this issue, a multi-dimensional Markov chain input parametrization, first proposed by Brighenti et al. (2009), is used here. {defn} For and , let be a valued first-order finite Markov chain, where , such that the sample values of , depend on the past only through the sample values of