Simultaneous Mode, Input and State Estimation for Switched Linear Stochastic Systems
Abstract
In this paper, we propose a filtering algorithm for simultaneously estimating the mode, input and state of hidden mode switched linear stochastic systems with unknown inputs. Using a multiplemodel approach with a bank of linear input and state filters for each mode, our algorithm relies on the ability to find the most probable model as a mode estimate, which we show is possible with input and state filters by identifying a key property, that a particular residual signal we call generalized innovation is a Gaussian white noise. We also provide an asymptotic analysis for the proposed algorithm and provide sufficient conditions for asymptotically achieving convergence to the true model (consistency), or to the ‘closest’ model according to an informationtheoretic measure (convergence). A simulation example of intentionaware vehicles at an intersection is given to demonstrate the effectiveness of our approach.
First]Sze Zheng Yong Second]Minghui Zhu First]Emilio Frazzoli
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA (email: szyong@mit.edu, frazzoli@mit.edu).
Department of Electrical Engineering, Pennsylvania State University, 201 Old Main, University Park, PA 16802, USA (email: muz16@psu.edu).
1 Introduction
Most autonomous systems must operate without knowledge of the intention and the decisions of other systems or humans. Thus, in many instances, these intentions and control decisions need to be inferred from noisy measurements. This problem can be conveniently considered within the framework of hidden mode hybrid systems (HMHS, see, e.g., [verma.delvecchio.12, yong.ACC.2013] and references therein) with unknown inputs, in which the system state dynamics is described by a finite collection of functions. Each of these functions corresponds to an intention or mode of the hybrid system, where the mode is unknown or hidden and mode transitions are autonomous. In addition, by allowing unknown inputs in this framework, both deterministic and stochastic disturbance inputs and noise can also be considered. There are a large number of applications, such as urban transportation systems [Yong.Zhu.ea.CDC14_switched], aircraft tracking and fault detection [Liu.Hwang.2011], as well attackresilient estimation of power systems [Yong.Zhu.ea.CDC15], in which it is not realistic to assume knowledge of the mode and disturbance inputs or they are simply impractical or too costly to measure.
Literature review. The filtering problem of hidden mode hybrid systems without unknown inputs have been extensively studied (see, e.g., [BarShalom.2004, Mazor.1998] and references therein), especially in the context of target tracking applications, along with their convergence and consistency properties [Baram.Jun1978, Baram.Feb1978]. These filtering algorithms, which use a multiplemodel approach, consist of a bank of Kalman filters [KalmanF.1960] for each mode and a likelihoodbased approach that uses the whiteness property of the innovation [Hanlon.2000, Kailath.1968] to determine the probability of each mode. In the case when the mode transition is assumed to be Markovian, hypothesis merging algorithms are developed such as the generalized pseudoBayesian (GPBn) as well as the interacting multiplemodel (IMM) algorithms [BarShalom.2004, Blom.1988].
However, oftentimes the disturbance inputs that include exogenous input, fault or attack signals cannot be modeled as a zeromean, Gaussian white noise or as a restricted finite set of input profiles, which gives rise to a need for an extension of the existing algorithms to hidden mode hybrid systems with unknown inputs. Such an algorithm was first proposed in [Liu.Hwang.2011] for a limited class of systems, i.e., when unknown inputs only affect the dynamics. Thus, more general algorithms for systems where unknown inputs that can also affect output measurements, as is the case for data injection attacks on sensors [Yong.Zhu.ea.CDC15], are still lacking. Moreover, the approach taken in [Liu.Hwang.2011] is based on running a bank of stateonly filters with a possibly suboptimal decoupling of the unknown inputs, as opposed to simultaneous input and state filters that have lately gained more attention. Of all the proposed algorithms, the input and state filters in our previous work [Yong.thesis2015, Yong.Zhu.ea.CDC15_General, Yong.Zhu.ea.Automatica15] are in the most general form and have proven stability and optimality properties, and are hence the most suitable for the problem at hand.
Contributions. In this paper, we present a novel multiplemodel approach for simultaneous estimation of mode, input and state of switched linear stochastic systems with unknown inputs. As with multiplemodel estimation of systems without unknown inputs, a bank of optimal input and state filters [Yong.thesis2015, Yong.Zhu.ea.CDC15_General, Yong.Zhu.ea.Automatica15], one for each mode, is run in parallel. Next, we devise a likelihoodbased mode association algorithm to determine the probability of each mode. This involves the definition of a generalized innovation signal, which we prove is a Gaussian white noise. Then, we use this whiteness property to form a likelihood function, which is used to find the most probable mode. To manage the growing number of hypotheses, we employ a similar approach to the interacting multiplemodel estimator [Blom.1988], which mixes the initial conditions based on mode transition probabilities. We then study the asymptotic behavior of our approach (also for a very special case when the hidden mode is deterministic) and provide sufficient conditions for asymptotically achieving convergence to the true model (consistency), or to the ‘closest’ model according to an informationtheoretic measure, i.e., with the minimum KullbackLeibler (KL) divergence [Kullback.1951] (convergence). A preliminary version of this paper was presented at the 2014 and 2015 IEEE Conference on Decision and Control [Yong.Zhu.ea.CDC14_switched, Yong.Zhu.ea.CDC15] where the asymptotic behavior of only the special case of a deterministic hidden mode was investigated.
2 Motivating Example
To motivate the problem considered in this paper, we consider the scenario of vehicles crossing a 4way intersection where each vehicle does not have any information about the intention of the other vehicles. To simplify the problem, we consider the case with two vehicles (see Figure 1): Vehicle A is human driven (uncontrolled) and Vehicle B is autonomous (controlled), with dynamics described by and , where and are vehicle positions and velocities. We assume\@footnotemark\@footnotetextThe assumed permutation of intentions is for illustrative purposes only and was not a result of any limitations on the proposed algorithms. that Vehicle A approaches the intersection with a default intention, i.e., without considering the presence of Vehicle B. Then, at the intersection, the driver of Vehicle A can choose between three intentions:

to continue while ignoring the other vehicle with an unknown input (Inattentive Driver, default mode),

to attempt to cause a collision (Malicious Driver), or

to stop (Cautious Driver).
Then, once either vehicle completes the crossing of the intersection, Vehicle A returns to the default intention.
Thus, in the presence of noise, this intersectioncrossing scenario is an instance of a hidden mode switched linear stochastic system with an unknown input. The intention of driver A is a hidden mode and the actual input of vehicle A is an unknown input (which is not restricted to a finite set). The objective is to simultaneously estimate the intention (mode), input and state of the vehicles for safe navigation through the intersection.
3 Problem Statement
We consider a hidden mode switched linear stochastic system with unknown inputs (see Figure 2):
(1) 
where is the continuous system state and the hidden discrete state or mode. The mode jump process is assumed to be leftcontinuous and hidden mode systems refer to systems in which is not directly measured and the mode transitions are autonomous. For each mode , is the known input, the unknown input, the output, the mode transition function, and are flow and jump sets, while the process noise and the measurement noise are assumed to be mutually uncorrelated, zeromean, Gaussian white random signals with known covariance matrices, and , respectively. The matrices , , , , and are known, and is independent of and for all . In addition to the common assumptions above, we assume the following:
 A1)

No prior ‘useful’ knowledge of the dynamics of is known (uncorrelated with , , and , ) and can be a signal of any type.
 A2)

In each mode, the system is strongly detectable\@footnotemark\@footnotetextThat is, the initial condition and the unknown input sequence can be asymptotically determined from the output sequence as (see [Yong.Zhu.ea.Automatica15, Section 3.2] for necessary and sufficient conditions for this property)..
The objective of this paper is to design a recursive filter algorithm which simultaneously estimates the system state , the unknown input and the hidden mode based on the measurements up to time , , as well as to analyze the asymptotic behavior of the proposed algorithm.
4 Preliminary Material
In this section, we present a brief summary of the minimumvariance unbiased filter for linear systems with unknown inputs. For detailed proof and derivation of the filter, the reader is referred to [Yong.thesis2015, Yong.Zhu.ea.CDC15_General, Yong.Zhu.ea.Automatica15]. Moreover, we define a generalized innovation and show that it is a Gaussian white noise. These form an essential part of the multiplemodel estimation algorithm that we will describe in Section 5. The algorithm runs a bank of filters (one for each mode) in parallel and the filters are in essence the same except for the different sets of matrices and signals . Hence, to simplify notation, the conditioning on the mode is omitted in the entire Section 4.
4.1 Optimal Input and State Filter
As is shown in [Yong.Zhu.ea.Automatica15, Section 3.1], the system for each mode after a similarity transformation is given by:
(2)  
(3)  
(4) 
The transformation essentially decomposes the unknown input and the measurement , each into two components, i.e., and ; as well as and , where . For conciseness, we assume that the system states can be estimated without delay\@footnotemark\@footnotetextThat is, when has full column rank. By allowing potential delays in state estimation, this assumption can be relaxed such that input and state estimation is possible as long as the system is strongly detectable [Yong.Zhu.ea.CDC15_General]. For brevity, we refer the readers to the filter algorithms and analysis in [Yong.Zhu.ea.CDC15_General].. Then, given measurements up to time , the optimal threestep recursive filter in the minimumvariance unbiased sense can be summarized as follows:
Unknown Input Estimation:
(5) 
Time Update:
(6) 
Measurement Update:
(7) 
where , , and denote the optimal estimates of , , and ; is a design matrix that is chosen to project the residual signal onto a vector of independent random variables, while , and , as well as , are filter gain matrices that minimize the state and input error covariances. For the sake of completeness, the optimal input and state filter in [Yong.thesis2015, Yong.Zhu.ea.Automatica15] is reproduced in Algorithm 1.
4.2 Properties of the Generalized Innovation Sequence
In Kalman filtering, the innovation reflects the difference between the measured output at time and the optimal output forecast based on information available prior to time . The a posteriori (updated) state estimate is then a linear combination of the a priori (predicted) estimate and the weighted innovation. In the same spirit, we generalize this notion of innovation to linear systems with unknown inputs by defining a generalized innovation given by:
(8)  
which, similar to the conventional innovation, is weighted by and combined with the predicted state estimate to obtain the updated state estimate as seen in (4.1). This definition differs from the conventional innovation in that the generalized innovation uses a subset of the measured outputs, i.e. . In addition, the matrix is any matrix whose rows are independent of each other and are in the range space of that removes dependent components of (a consequence of [Yong.thesis2015, Lemma 7.6.3] and [Yong.Zhu.ea.Automatica15, Lemma 10]), which further lowers the dimension of the generalized innovation. An intuition for this is that the information contained in the ‘unused’ subset is already exhausted for estimating the unknown inputs. Moreover, the optimal output forecast that is implied in (4.2) is a function of which contains information from the measurement at time . Nonetheless, it is clear from (4.2) that when there are no unknown inputs, , , , and can be chosen to be the identity matrix, in which case the definitions of generalized innovation and (conventional) innovation coincide.
In the following theorem, we establish that the generalized innovation, like the conventional innovation, is a Gaussian white noise (see proof in Section 6).
Theorem 1
The generalized innovation, given in (4.2) is a Gaussian white noise with zero mean and a variance of , with .
4.3 Likelihood Function
To facilitate the computation of model probabilities that is required in the multiplemodel estimation algorithm we propose, we derive the likelihood function for each mode at time , , as follows (proven in Section 6).
Theorem 2
The likelihood that model is consistent with measurement and generalized innovation , given all measurements prior to time , , is given by the likelihood function:
(9) 
where , and is given in Theorem 1; and represent the MoorePenrose pseudoinverse and pseudodeterminant, respectively.
5 MultipleModel Estimation Algorithms
The multiplemodel (MM) approach we take is inspired by the multiplemodel filtering algorithms for hidden mode hybrid systems with known inputs (e.g., [BarShalom.2004, Mazor.1998] and references therein), that have been widely applied for target tracking. Our multiplemodel framework consists of the parallel implementation of a bank of input and state filters described in Section 4.1, with each model corresponding to a system mode (see Figure 3). The objective of the MM approach is then to decide which model/mode is the best representation of the current system mode as well as to estimate the state and unknown input of the system based on this decision.
To do this, we first use Bayes’ rule to recursively find the posterior mode probability at step for each mode , given measurements and prior mode probabilities , as
(10) 
where we assumed that the probability of is independent of the measurement . The rationale is that since we have no knowledge about and the signal can be of any type, the measurement provides no ‘useful’ information about the likelihood of the system mode (cf. (4.1)). The likelihood function is similarly defined as given by (9). Moreover, the Bayesian approach provides a means to encode what we know about the prior mode probabilities at time :
(11) 
where is the prior information at time and . The maximum a posteriori (MAP) mode estimate is then the most probable mode at each time that maximizes (10).
5.1 Dynamic MultipleModel Estimation
Our multiplemodel estimation algorithm (cf. Figure 4 and Algorithm 2) assumes that the hidden mode is stochastic, i.e., the true mode switches in a Markovian manner with known, timeinvariant and possibly state dependent transition probabilities
For brevity and without loss of generality, we assume that the mode transition probabilities are state independent, i.e., . In other words, mode transition is a homogeneous Markov chain. The incorporation of the state dependency for stochastic guard conditions is rather straightforward, albeit lengthy and interested readers are referred to [Seah.2009] for details and examples. We also assume that we have a fixed number of models. For better performance, modifications of the algorithm can be carried out to allow for a varying number of models (cf. [Li.1996] for a discussion on model selection and implementation details).
In fact, the mode transition probabilities can serve as estimator design parameters (cf. [BarShalom.2004]), but care should be given when choosing the mode transition probabilities, as we shall see in Section 5.1.1 that a wrong choice can also be detrimental to the consistency of the mode estimates. In addition, with the Markovian setting, the mode can change at each time step. As a result, the number of hypotheses (mode history) grows exponentially with time. Therefore, an optimal multiplemodel filter is computationally intractable. We thus resort to suboptimal filters that manage the hypotheses in an efficient way. The simplest technique is hypothesis pruning in which a finite number of most likely hypotheses are kept, whereas the hypothesis merging approach keeps only the last few of the mode histories, and combines hypotheses that differ in earlier steps (cf. [BarShalom.2004] for approaches designed for switched linear systems without unknown inputs). In the following, we propose a hypothesis merging approach similar to the interacting multiplemodel (IMM) algorithm [Blom.1988], which is considered the best compromise between complexity and performance [BarShalom.2004].
Instead of maintaining the exponential number of hypotheses (i.e., ), our estimator maintains a linear number of estimates and filters (i.e., ) at each time , by introducing three major components:
 Initial condition mixing:

We compute the probability that the system was in mode at time conditioned on and currently being in mode :
(12) The initial conditions for the filter matched to for all are then mixed according to:
(13) (14) (15) Note that there is no mixing of and its covariances because they are computed for a previous step and are not initial conditions for the bank of filters.
 Modematched filtering:
 Posterior mode probability computation:

Given measurements up to time , the posterior probability of mode can be found by substituting from the denominator of (12) into (10):
(16) Then, these mode probabilities are used to determine the most probable (MAP) mode at each time and the associated state and input estimates and covariances:
(17)
5.1.1 Filter Properties
We now investigate the asymptotic behavior of our filter, i.e., its mode distinguishability properties:
Definition 3 (Mean Convergence)
A filter is mean convergent to a model , if the geometric mean of the mode probability for model asymptotically converges to 1 for all initial mode probabilities.
Definition 4 (Mean Consistency)
A filter is mean consistent, if the geometric mean of the mode probability for the true model asymptotically converges to 1 for all initial mode probabilities.
In the following, we show that under some reasonable conditions, our filter is mean convergent to the model which is closest according to an informationtheoretic measure (i.e., with the minimum KullbackLeibler (KL) divergence [Kullback.1951]), and when the true model is in the set of models, the filter is mean consistent. We will also discuss the optimality of resulting input and state estimates. The proofs of these results will be provided in Section 6.
Convergence/Consistency of Mode Estimates. We first derive the KL divergence of each model from the true model. Then, we analyze the mean behavior (averaged over all possible states) of the mode estimates.
Lemma 5
The KL divergence of model from the true model is
(18) 
where is a shorthand for