Towards Finding the Critical Value for Kalman Filtering with Intermittent Observations

# Towards Finding the Critical Value for Kalman Filtering with Intermittent Observations

Yilin Mo and Bruno Sinopoli This research was supported in part by CyLab at Carnegie Mellon under grant DAAD19-02-1-0389 from the Army Research Office. Foundation. The views and conclusions contained here are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either express or implied, of ARO, CMU, or the U.S. Government or any of its agencies. Yilin Mo and Bruno Sinopoli are with the ECE department of Carnegie Mellon University, Pittsburgh, PA ymo@andrew.cmu.edu, brunos@ece.cmu.edu
###### Abstract

In [1], Sinopoli et al. analyze the problem of optimal estimation for linear Gaussian systems where packets containing observations are dropped according to an i.i.d. Bernoulli process, modeling a memoryless erasure channel. In this case the authors show that the Kalman Filter is still the optimal estimator, although boundedness of the error depends directly upon the channel arrival probability, . In particular they also prove the existence of a critical value, , for such probability, below which the Kalman filter will diverge. The authors are not able to compute the actual value of this critical probability for general linear systems, but provide upper and lower bounds. They are able to show that for special cases, i.e. invertible, such critical value coincides with the lower bound. This paper computes the value of the critical arrival probability, under minimally restrictive conditions on the matrices and .

## I Introduction

A large wealth of applications demand wireless communication among small embedded devices. Wireless Sensor Network (WSN) technology provides the architectural paradigm to implement systems with a high degree of temporal and spatial granularity. Applications of sensor networks are becoming ubiquitous, ranging from environmental monitoring and control to building automation, surveillance and many others[2]. Given their low power nature and the requirement of long lasting deployment, communication between devices is power constrained and therefore limited in range and reliability. Changes in the environment, such as the simple relocation of a large metal object in a room or the presence of people, will inevitably affect the propagation properties of the wireless medium. Channels will be time-varying and unreliable. Spurred by this consideration, our effort concentrates on the design and analysis of estimation and control algorithms over unreliable networks. A substantial body of literature has been devoted to such issues in the past few years. In this paper we want to revisit the paper of Sinopoli et al. [1]. In that paper, the authors analyze the problem of optimal state estimation for discrete-time linear Gaussian systems, under the assumption that observations are sent to the estimator via a memoryless erasure channel. This implies the existence of a non-unitary arrival probability associated with each packet. Consequently some observations will inevitably be lost. In this case although the Kalman Filter is still the optimal estimator, the boundedness of its error depends on the arrival probabilities of the observation packets. In particular the authors prove the existence of a critical arrival probability , below which the expectation of estimation error covariance matrix of Kalman filter will diverge. The authors are not able to compute the actual value of this critical probability for general linear systems, but provide upper and lower bounds. They are able to show that for special cases such critical value coincides with the lower bound.

A significant amount of research effort has been made toward finding the critical value. In [1], the author prove that the critical value coincides with the lower bound in a special case when the system observation matrix is invertible. The condition is further weakened by Plarre and Bullo [3] to only invertible on the observable subspace. In [4], the authors prove that if the eigenvalues of system matrix have distinguished absolute values, then the lower bound is indeed the critical value. The authors also provide a counter example to show that in general the lower bound is not tight.

Other variations of the original problem are also considered. In [5], the authors introduce smart sensors, which send the local Kalman estimation instead of raw observation. In [6], a similar scenario is discussed where the sensor sends a linear combination of the current and previous measurement. A Markovian packet dropping model is introduced in [7] and a stability criterion was given. In [8], the authors study the case where the observation at each time splits into two parts, which are sent to the Kalman filter through two independent erasure channels. A much more general model, which considered packet drop, delay and quantization of measurements in the same time, is introduced by Xie and Shi [9].

Another interesting direction to characterize the impact of lossy network on state estimation is to directly calculate the probability distribution of estimation error covariance matrix instead of considering the boundedness of its expectation. In [10], the author gives a closed-form expression for cumulative distribution function of when the system satisfies non-overlapping conditions. In [11], the authors provide a numerical method to calculate the eigen-distribution of under the assumption that the observation matrix is random and time varying.

In the meantime, lots of research effort has been made to design estimation and control schemes over lossy network, by leveraging the result obtained from above work. In [12], the authors consider a stochastic sensor scheduling scheme, which randomly selected one sensor to transmit observation at each time. In [13], the authors shows how to design the packet arrival rate to balance the state estimation error and energy cost of packet transmission.

In a nutshell, we feel that derivation of critical value is not only important for analyzing the performance of the system in lossy networks, but also critical for network control protocol design. However, in a large proportion of the above work, the critical value is derived under the condition that matrix is invertible or other similar conditions, which are not easy to satisfy for certain real applications111 invertible implies that the number of sensors is no less than the number of states.. In this paper, we would like to characterize the critical value under more general conditions showing that it meets the lower bound in most cases. We also study some systems for which the lower bound is not tight and try to give some insights on why this is the case.

The paper are organized in the following manner: Section II formulates the problem. Section III states all the important results of the paper, which will be proved later by Section IV, V, VI. Finally Section VII concludes the paper.

## Ii Problem Formulation

Consider the following linear system

 xk+1=Axk+wk,yk=Cxk+vk, (1)

where is the state vector, is the output vector, and are Gaussian random vectors with zero mean and covariance matrices and , respectively. Assume that the initial state, is also a Gaussian vector of mean and covariance matrix . Let to be mutually independent. Note that we assume the covariance matrices of to be strictly positive definite. Define as the eigenvalues of .

Consider the case where observations are sent to the estimator via a memoryless erasure channel, where their arrival is modeled by a Bernoulli independent process . According to this model, the measurement sent at time reaches its destination if ; it is lost otherwise. Let be independent of , i.e. the communication channel is independent of both process and measurement noises and let .

The Kalman Filter equations for this system were derived in [1] and take the following form:

 ^xk|k=^xk|k−1+γkKk(yk−C^xk|k−1),Pk|k=Pk|k−1−γkKkCPk|k−1,

where

 ^xk+1|k=A^xk|k,Pk+1|k=APk|kAT+Q,Kk=Pk|k−1CT(CPk|k−1CT+R)−1,^x0|−1=¯x0,P0|−1=Σ0.

In the hope to improve the legibility of the paper we will slightly abuse the notation, by substituting with . The equation for the error covariance of the one-step predictor is the following:

 Pk+1=APkAT+Q−γkAPkCT(CPkCT+R)−1CPkAT. (2)

If s are i.i.d. Bernoulli random variables, the following theorem holds [1]:

###### Theorem 1

If is controllable, is detectable, and is unstable, then there exists a such that 222We use the notation when the sequence is not bounded; i.e., there is no matrix such that .333Note that all the comparisons between matrices in this paper are in the sense of positive definite if without further notice

 supkEPk=+∞ for 0≤p≤pcand ∃P0≥0, (3) EPk≤MP0∀t for pc

where depends on the initial condition .

For simplicity, we will say that is unbounded if or is bounded if there exists a uniform bound independent of .

## Iii Main Result

In this section, we want to state all the important results for critical value, the proof of which can be found in later sections. Through out the rest of the paper, we always assume that the following conditions hold:

1. is detectable.

2. can be diagonalized.

3. are strict positive definite.

From Section II, it is clear that the critical value of a system should be a function of all system parameters, i.e. . However, the following theorem, the proof of which is in Section IV, states that the critical value does not depend on as long as they are all positive definite.

###### Theorem 2

If are strictly positive definite, then the critical value of a system is just a function of , and is independent of .

Since we have already assumed that are strictly positive definite, by Theorem 2, we can let without loss of generality. Also since we assume that can be diagonalized, we can always transform the system into its diagonal standard form. Hence, we assume that is diagonal. We can also denote as the critical value of system .

When the dimension of is large, which is often the case in reality, it is desirable to break the large system to several smaller blocks (or subsystems), which are easier to analyse. As a result, we define a block of the system in the following way:

###### Definition 1

Consider the system is in its diagonal standard form, which means and . A block of the system is defined as subsystem , where is the index set.

A special type of block, which we call equi-block, plays a central role in determining the critical value of the system and it is defined as

###### Definition 2

An equi-block is a block which satisfies , and we denote it as , where is the index set.

###### Definition 3

is defined as the dimension of the largest equi-block of the system.

The following theorem shows a basic inequality between the critical value of the original system and smaller blocks, which we will prove in Section IV.

###### Theorem 3

Define as the critical value for system . If diagonal and , then

 fc(A,C)≥fc(AI,CI), (5)

for all possible index set .

Before we continue on, we need to define the following terms:

###### Definition 4

A system is one step observable if is full column rank.

###### Definition 5

An equi-block is degenerate if it is not one step observable. It is non-degenerate otherwise.

###### Definition 6

The system is non-degenerate if every equi-block of the system is non-degenerate. It is degenerate if there exists at least one degenerate equi-block.

For example, if and , then the system is degenerate since it is an equi-block and not one step observable. For and , two equi-blocks are and and both of them are one step observable. Thus, the system is non-degenerate.

It can be seen that non-degeneracy is a stronger property than observability but much more weaker than one step observability. In fact, for a one step observable system, matrix must have at least rows, which implies is at least a vector. On the other hand, for non-degenerate system, the matrix can only have rows. In reality, is usually a small number comparing to .

In [1], the authors proved that the critical value meets the lower bound when the system is one step observable. In this paper, we weaken the condition from one step observability to non-degeneracy.

###### Theorem 4

If the system 1 satisfies assumptions and the equiblocks of associated to the unstable and critically stable eigenvalues are non-degenerate, the critical value of the Kalman filter is

 pc=max(1−|λ1|−2,0) (6)

where is the dominant eigenvalue.

For degenerate systems we can show that in general the critical value is larger than the one computed in theorem 4. Nonetheless in this paper we will compute the critical value for second order degenerate systems. This includes a very practical case, involving complex conjugate eigenvalues. Let . We can use the following theorem in conjunction with Theorem 3 as the building block to allow analysis of larger systems.

###### Theorem 5

For a detectable system with and , the critical value is

 pc=fc(A,C)=max(1−|λ1|−2,0), (7)

if the system is non-degenerate, or in other word, if one of the following conditions holds

1. ,

2. .

Otherwise the system is degenerate and its critical value is

 pc=fc(A,C)=max(1−|λ1|−21−DM(φ/2π),0), (8)

where , and is the modified Dirichlet function defined as

 DM(x)={0for x irrational1/qfor x=r/q, r,q∈Z and irreducible.. (9)

## Iv Properties of Critical Value

In this section, we will prove Theorem 2 and 3, which demonstrate the relationship between critical value and system parameters. Throughout this section, we always assume that assumption holds.

First we want to prove the independence between critical value and the covariance matrix of the noise.

###### Proof:

Since , we can find uniform upper and lower bounds , such that

 α––Im≤R≤¯¯¯¯αIm,α––In≤Σ0≤¯¯¯¯αIn,α––In≤Q≤¯¯¯¯αIn.

Let us define , and

 P––k+1 =AP––kAT+α––In−γkAP––kCT(CP––kCT+α––Im)−1CP––kAT, ¯¯¯¯Pk+1 =A¯¯¯¯PkAT+¯¯¯¯αIn−γkA¯¯¯¯PkCT(C¯¯¯¯PkCT+¯¯¯¯αIm)−1C¯¯¯¯PkAT, P∗k+1 =AP∗kAT+In−γkAP∗kCT(CP∗kCT+Im)−1CP∗kAT.

By induction, it is easy to check that and for all .

Also we know that . By induction, suppose that , then

 P––k+1 =AP––kAT+α––In−γkAP––kCT(CP––kCT+α––Im)−1CP––kAT ≤APkAT+α––In−γkAPkCT(CPkCT+α––Im)−1CPkAT ≤APkAT+Q−γkAPkCT(CPkCT+R)−1CPkAT=Pk+1.

Hence, for all . By the same argument, for all , which implies that

 P––k=α––P∗k≤Pk≤¯¯¯¯Pk=¯¯¯¯αP∗k. (10)

Since , the boundedness of is equivalent to the boundedness of . However by the definition of , we know that it is only a function of which is independent of . \qed

We now want to prove that the critical value of a system is larger than the critical value of any of its blocks.

###### Proof:

With out loss of generality444If , the proof is trivial. If is an arbitrary subset of size , we can always permute the states to make it equal , we assume that , . Let us define to be the complement index set of .

By Theorem 2, we suppose for the original system .

Let us define and

 ~Pk+1=A~PkAT+In−γkA~Pk~CT(~C~Pk~CT+I2m/2)−1~C~PkAT,

where . Using Matrix Inversion Lemma, we can show that

 Pk+1 =A(P−1k+γkCTC)−1AT+In. (11) ~Pk+1 =A[~P−1k+2γk~CT~C]−1AT+In.

We know that . By induction, suppose that , then

 (12)

Hence,

 Pk+1=A(P−1k+γkCTC)−1AT+Q≥A(~P−1k+2γk~CT~C)−1AT+Q=~Pk+1.

Thus, by induction, , which in turn proves

 fc(A,C)≥fc(A,~C). (13)

Now define , and

 ~Pk+1,I =AI~Pk,IATI+Il−γkAI~Pk,ICTI(CI~Pk,ICTI+Im/2)−1CI~Pk,IATI, ~Pk+1,J =AJ~Pk,JATJ+Im−l−γkAJ~Pk,JCTJ(CJ~Pk,JCTJ+Im/2)−1CJ~Pk,JATJ.

It is not hard to check that , for all . As a result, is bounded if and only if and are both bounded. Combining (13), we know

 fc(A,C)≥fc(A,~C)=max{fc(AI,CI),fc(AJ,CJ)}.
\qed

## V Critical Value for Non-degenerate Systems

This section is devoted to proving Theorem 4. Before continuing, we would like to state several important intermediate results which are useful for proving the main theorem and have some theoretical value of their own.

We will first deal with systems whose eigenvalues are all unstable. We will lift this restriction later in the paper. By “unstable” we mean that its absolute value is strictly greater than . We will call the eigenvalues on the unit circle critically stable and the ones with absolute values strictly less than stable. Since is diagonalizable, we will restrict our analysis to systems with diagonal . Also since some eigenvalues of may be complex, we will use Hermitian instead of transpose in the remaining part of the article.

Similarly to the observability Grammian, we find that the matrix plays an essential role in determining the boundedness of Kalman filter, which is characterized by the following theorem:

###### Theorem 6

If a system satisfies assumptions and all its eigenvalues are unstable, then if finite if and only if exists, and it satisfies the following inequality

 α––E[(∞∑i=1γiA−iHCHCA−i)−1]≤supkEPk≤¯¯¯¯αE[(∞∑i=1γiA−iHCHCA−i)−1], (14)

where are constants.

By manipulating , we established the following result, which is essentially equivalent to Theorem 4, but restricted on the systems whose eigenvalues are all unstable.

###### Theorem 7

If a non-degenerate system satisfies assumptions and all its eigenvalues are unstable, then the critical value of the system is

 pc=1−|λ1|−2.

If the arrival probability , then for all initial conditions, will be bounded for all . Else if , for some initial conditions, is unbounded.

Now we need to generalize this result to systems that have stable eigenvalues. The following theorem provides an important inequality for systems that have stable eigenvalues:

###### Theorem 8

Consider a system satisfies assumption with a diagonal , 555Note that there is no requirement of non-degeneracy for this theorem.. If is the unstable part, is the critically stable part and is the stable part and s are of proper dimensions, then the critical value of the system satisfies the following inequality

 fc(A,C)≤limα→1+fc(diag(αA1,αA2),[C1,C2])\lx@notefootnoteNotethat$diag(αA1,αA2)$areunstablewhen$α>1$.HencetherighthandsideoftheinequalitycanbecomputedbyTheorem ???.. (15)

Combining Theorem 3, 7 and 8, we will prove Theorem 4 in the last part of this section.

### V-a Proof of Theorem 6

In this subsection, we are going to prove Theorem 6. The key idea in the proof is to avoid analysing Riccati Equations, which were intensively studied in the previous works[1][3], and try to formulate estimation error covariance by maximum likelihood estimator. First let us write down the relation between and :

 ⎡⎢ ⎢ ⎢ ⎢ ⎢⎣γkyk⋮γ0y0¯x0⎤⎥ ⎥ ⎥ ⎥ ⎥⎦=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣γkCA−1⋮γ0CA−k−1A−k−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦xk+1+⎡⎢ ⎢ ⎢ ⎢ ⎢⎣γkvk⋮γ0v0¯x0−x0⎤⎥ ⎥ ⎥ ⎥ ⎥⎦−⎡⎢ ⎢ ⎢ ⎢ ⎢⎣γkCA−1⋯00⋮⋱⋮⋮γ0CA−k−1⋯γ0CA−2γ0CA−1A−k−1⋯A−2A−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦⎡⎢ ⎢ ⎢ ⎢⎣wk⋮w1w0⎤⎥ ⎥ ⎥ ⎥⎦. (16)

The rows where s are zero can be deleted, since they do not provide any information to improve the estimation of . To write (16) in a more compact way, let us define the following quantities:

 Fk≜⎡⎢ ⎢ ⎢ ⎢ ⎢⎣A−1⋯00⋮⋱⋮⋮A−k⋯A−10A−k−1⋯A−2A−1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦∈Rn(k+1)×n(k+1). (17)
 Gk≜∈⎡⎢ ⎢ ⎢ ⎢ ⎢⎣C⋯0⋮⋱⋮0⋯C0⋯In⎤⎥ ⎥ ⎥ ⎥ ⎥⎦R[n+m(k+1)]×n(k+1). (18)
 ek≜−GkFk⎡⎢ ⎢ ⎢ ⎢⎣wk⋮w1w0⎤⎥ ⎥ ⎥ ⎥⎦+⎡⎢ ⎢ ⎢ ⎢⎣vk⋮v0¯x0−x0⎤⎥ ⎥ ⎥ ⎥⎦∈Rn+m(k+1). (19)
 (20)

Define as the matrix of all non zero rows of . Thus is a () by () matrix. Also define

 ˜Yk≜ΓkYk,˜Tk≜ΓkTk,˜ek≜Γkek.

are now stochastic matrices as they are functions of .

We can rewrite (16) in a more compact form as

 ˜Yk=˜Tkxk+1+˜ek. (21)

From (21), we know that is Gaussian distributed with unknown mean and known covariance . Hence, we can prove the following lemmas:

###### Lemma 1

If is invertible, then the state estimation and estimation error covariance given by Kalman filter satisfy the following equations

 ^xk+1|k=(˜THkCov(˜ek|Γk)−1˜Tk)−1THkCov(˜ek|Γk)−1˜Yk, (22)
 Pk+1=(˜THkCov(˜ek|Γk)−1˜Tk)−1. (23)
###### Lemma 2

If , where , then is bounded by

 1(|λ1|+1)2In(k+1)≤FkFHk≤1(|λn|−1)2In(k+1), (24)

where is defined in (17).

###### Lemma 3

If a system satisfies assumptions and all its eigenvalues are unstable, then the error covariance matrix of Kalman Filter is bounded by

 α––(˜THk˜Tk)−1≤Pk+1≤¯¯¯¯α(˜THk˜Tk)−1 (25)

where are constants independent of and .777We abuse the notations of , which will also be used several times later. These notations only means constant lower and upper bound, which are not necessarily the same in different theorems.

###### Proof:

Given the observation , by (21), we know that the maximum likelihood estimator of is

 ^xk+1|k=(˜THkCov(˜ek|Γk)−1˜Tk)−1THkCov(˜ek|Γk)−1˜Yk,

and the estimation error covariance is

 Pk+1=(˜THkCov(˜ek|Γk)−1˜Tk)−1.

Since is Gaussian with unknown mean and known covariance , we know that the maximum likelihood estimator is the optimal one in minimum mean error covariance sense. Thus, and given by maximum likelihood estimator are essentially the same as and of Kalman filter, which concludes the proof. \qed

###### Proof:

Notice that

 F−1k=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣A−I⋱⋱A−IA⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

Therefore,

 (FkFHk)−1=⎡⎢ ⎢ ⎢ ⎢ ⎢ ⎢⎣AAH+I−A−AH⋱⋱⋱AAH+I−A−AHAAH⎤⎥ ⎥ ⎥ ⎥ ⎥ ⎥⎦.

By Gershgorin’s Circle Theorem[14], we know that all the eigenvalues of are located inside one of the following circles:, where s are the eigenvalues of .

Since , for each eigenvalue of , the following holds:

 ζ≥min{|λi|2+1−|λi|,|λi|2+1−2|λi|,|λi|2−|λi|}, (26)

and

 ζ≤max{|λi|2+1+|λi|,|λi|2+1+2|λi|,|λi|2+|λi|}. (27)

Thus, , which in turn gives

 1(|λ1|+1)2In(k+1)≤FkFHk≤1(|λn|−1)2In(k+1).
\qed
###### Proof:

Since are mutually independent,

 Cov(ek)=Cov(GkFk[wk,…,w0]T)+Cov([vk,…,v1,x0])=GkFkdiag(Q,Q,…,Q)FHkGHk+diag(R,R,…,R,Σ0).

Since we assume that , using Lemma 2, it is easy to show that

 1(|λ1|+1)2GkGHk+In+mk≤Cov(ek)=GkFkFHkGHk+In+mk≤1(|λn|−1)2GkGHk+In+mk.

Since , define and , we know that , which implies

 α––ΓkΓTk≤Cov(˜ek|Γk)=ΓkCov(ek)ΓTk≤¯¯¯¯αΓkΓTk,

where . Notice that . Therefore,

 α––I≤Cov(˜ek|Γk)≤¯¯¯¯αI.

The above bound is independent of and , which proves

 α––(˜THk˜Tk)−1≤Pk+1=(˜THkCov(˜ek|Γk)−1˜Tk)−1≤¯¯¯¯α(˜THk˜Tk)−1.
\qed

Now we are ready to prove Theorem 6.

###### Proof:

As a result of Lemma 3, we only need to show that

 (28)

Rewrite