# Observability, Identifiability and Sensitivity

of Vision-Aided Inertial Navigation

August 20, 2013; revised May 10, 2014, September 20, 2014

###### Abstract

We analyze the observability of 3-D pose from the fusion of visual and inertial sensors. Because the model contains unknown parameters, such as sensor biases, the problem is usually cast as a mixed filtering/identification, with the resulting observability analysis providing necessary conditions for convergence to a unique point estimate. Most models treat sensor bias rates as “noise,” independent of other states, including biases themselves, an assumption that is patently violated in practice. We show that, when this assumption is lifted, the resulting model is not observable, and therefore existing analyses cannot be used to conclude that the set of states that are indistinguishable from the measurements is a singleton. In other words, the resulting model is not observable. We therefore re-cast the analysis as one of sensitivity: Rather than attempting to prove that the set of indistinguishable trajectories is a singleton, we derive bounds on its volume, as a function of characteristics of the sensor and other sufficient excitation conditions. This provides an explicit characterization of the indistinguishable set that can be used for analysis and validation purposes.

## 1 Introduction

^{1}

^{1}footnotetext: The authors are with the UCLA Vision Lab, University of California, Los Angeles, USA. Email: {jheez,ktsotsos,soatto}@ucla.edu.

We present a novel approach to the analysis of observability/identifiability of three-dimensional (3-D) pose in visually-assisted navigation, whereby inertial sensors (accelerometers and gyrometers) are used in conjunction with optical sensors (vision) to yield an estimate of the 3-D position and orientation of the sensor platform. It is customary to frame this as a filtering problem, where the time-series of positions and orientations of the sensor platform is modeled as the state trajectory of a dynamical system, that produces sensor measurements as outputs, up to some uncertainty. Observability/identifiability analysis refers to the characterization of the set of possible state trajectories that produce the same measurements, and therefore are âindistinguishableâ given the outputs [14, 8, 11, 7, 10].

The parameters in the model are either treated as unknown constants (e.g., calibration parameters) or as random processes (e.g., accelerometer and gyro biases) and included in the state of the model, which is then driven by some kind of uninformative (“noise”) input. Because noise does not affect the observability of a model, for the purpose of analysis it is usually set to zero. However, the input to the model of accelerometer and gyro bias is typically small but not independent of the state. Thus, it should be treated as an unknown input, which is known to be “small” in some sense, rather than “noise.”

Our first contribution is to show that while (a prototypical model of) assisted navigation is observable in the absence of unknown inputs, it is not observable when unknown inputs are taken into account.

Our second contribution is to reframe observability as a sensitivity analysis, and to show that while the set of indistinguishable trajectories is not a singleton (as it would be if the model was observable), it is nevertheless bounded. We explicitly characterize this set and bound its volume as a function of the characteristics of the inputs, which include sensor characteristics (bias rates) and the motion undergone by the platform (sufficient excitation).

### Related work

In addition to the above-referenced work on visual-inertial observability, our work relates to general unknown-input observability of linear time-invariant systems addressed in [1, 4], for affine systems [5], and non-linear systems in [3, 9, 2]. The literature on robust filtering and robust identification is relevant, if the unknown input is treated as a disturbance. However, the form of the models involved in aided navigation does not fit in the classes treated in the literature above, which motivates our analysis. The model we employ includes alignment parameters for the (unknown) pose of the inertial sensor relative to the camera.

### 1.1 Notation

We adopt the notation of [12], where a reference frame is represented by an orthogonal positive-determinant (rotation) matrix and a translation vector . They are collectively indicated by . When represents the change of coordinates from a reference frame “” to another (“”), it is indicated by . Then the columns of are the coordinate axes of relative to the reference frame , and is the origin of in the reference frame . If is a point relative to the reference frame , then its representation relative to is . In coordinates, if are the coordinates of , then are the coordinates of .

A time-varying pose is indicated with or , and the entire trajectory from an initial time and a final time is indicated in short-hand notation with ; when the initial time is , we omit the subscript and call the trajectory “up to time ”. The time-index is sometimes omitted for simplicity of notation when it is clear from the context.

We indicate with the (generalized) velocity or “twist”, where is a skew-symmetric matrix corresponding to the cross product with the vector , so that for any vector . We indicate the generalized velocity with . We indicate the group composition simply as . In homogeneous coordinates, where and

Composition of rigid motions is then represented by matrix product.

### 1.2 Mechanization Equations

The motion of a sensor platform is represented as the time-varying pose of the body relative to the spatial frame. To relate this to measurements of an inertial measurement unit (IMU) we compute the temporal derivatives of , which yield the (generalized) body velocity , defined by , which can be broken down into the rotational and translational components and . An ideal gyrometer (gyro) would measure . The translational component of body velocity, , can be obtained from the last column of the matrix . That is, , which serves to define . These equations can be simplified by defining a new linear velocity, , which is neither the body velocity nor the spatial velocity , but instead . Consequently, we have that and where the last equation serves to define the new linear acceleration ; as one can easily verify, we have that An ideal accelerometer (accel) would then measure where is the gravity vector.

There are several reference frames to be considered in a navigation scenario. The spatial frame , typically attached to Earth and oriented so that gravity takes the form where can be read from tabulates based on location and is typically around . The body frame is attached to the IMU. The camera frame , relative to which image measurements are captured, is also unknown, although we will assume that intrinsic calibration has been performed, so that measurements on the image plane are provided in metric units.

The equations of motion (known as mechanization equations) are usually described in terms of the body frame at time relative to the spatial frame . Since the spatial frame is arbitrary (other than for being aligned to gravity), it is often chosen to be co-located with the body frame at time . To simplify the notation, we indicate this time-varying frame simply as , and so for , thus effectively omitting the subscript wherever it appears. This yields where is the rotational acceleration, and the translational jerk (derivative of acceleration).

### 1.3 Sensor model

Although the acceleration defined above corresponds to neither body nor spatial acceleration, it is conveniently related to accelerometer measurements :

(1) |

where the measurement error in bracket includes a slowly-varying mean (“bias”) and a residual term that is commonly modeled as a zero-mean (its mean is captured by the bias), white, homoscedastic and Gaussian noise process. In other words, it is assumed that is independent of , hence uninformative. Here is the gravity vector expressed in the spatial frame. Measurements from a gyro, , can be similarly modeled as

(2) |

where the measurement error in bracket includes a slowly-varying bias and a residual “noise” also assumed zero-mean, white, homoscedastic and Gaussian, independent of .

Other than the fact that the biases change slowly, they can change arbitrarily. One can therefore consider them an unknown input to the model, or a state in the model, in which case one has to hypothesize a dynamical model for them. For instance,

(3) |

for some unknown inputs that can be safely assumed to be small, but not (white, zero-mean and, most importantly) independent of the biases. Nevertheless, it is common to consider them to be realizations of a Brownian motion that is independent of . This is done for convenience as one can then consider all unknown inputs as “noise.” Unfortunately, however, this has implications on the analysis of the observability and identifiability of the resulting model.

### 1.4 Model reduction

The mechanization equations above define a dynamical model having as output the IMU measurements. Including the initial conditions and biases, we have

(4) |

In this standard model, data from the IMU are considered as (output) measurements. However, it is customary to treat them as (known) input to the system, by writing in terms of and in terms of :

(5) |

This equality is valid for samples (realizations) of the stochastic processes involved, but it can be misleading as, if considered as stochastic processes, the noises above are not independent of the states. Such a dependency is nevertheless typically neglected. The resulting mechanization model is

(6) |

### 1.5 Imaging model and alignment

Initially we assume there is a collection of points , visible from time to the current time . If is a canonical central (perspective) projection, assuming that the camera is calibrated,^{1}^{1}1Intrinsic calibration parameters are known and compensated for. aligned,^{2}^{2}2The pose of the camera relative to the IMU is known and compensated for. and that the spatial frame coincides with the body frame at time , we have

(7) |

If the feature first appears at time and if the camera reference frame is chosen to be the origin the world reference frame so that , then we have that , and therefore

(8) |

where is the homogeneous coordinate of , , and . Here is the (unknown, scalar) depth of the point at time , and again the dependency of the noise on the state is neglected. With an abuse of notation, we write the map that collectively projects all points to their corresponding locations on the image plane as , or:

(9) |

In practice, the measurements are known only up to a transformation mapping the body frame to the camera, often referred to as “alignment”:

(10) |

We can then, as done for the points , add it to the state with trivial dynamics .

It may be convenient in some cases to represent the points in the reference frame where they first appear, say at time , rather than in the spatial frame. This is because the uncertainty is highly structured in the frame where they first appear: if , then has the same uncertainty of the feature detector (small and isotropic on the image plane) and has a large uncertainty, but it is constrained to be positive.

However, to relate to the state, we must bring it to the spatial frame, via , which is unknown. Although we may have a good approximation of it, the current estimate of the state , the pose when the point first appears should be estimated along with the coordinates of the points. Therefore, we can represent using , and :

(11) |

Clearly this is an over-parametrization, since each point is now represented by parameters instead of . However, the pose can be pooled among all points that appear at time , considered therefore as a group. At each time, there may be a number groups, each of which has a number points. We indicate the group index with and the point index with , omitting the dependency on for simplicity. The representation of then evolves according to

(12) |

## 2 Analysis of the model

The goal here is to exploit imaging and inertial measurements to infer the sensor platform trajectory. For this problem to be well-posed, a (sufficiently exciting) realization of and should constrain the set of trajectories that satisfy (6)-(12) to be unique. If there are different trajectories that satisfy (4) with the same outputs, they are indistinguishable. If the set of indistinguishable trajectories is a singleton (contains only one element, presumably the “true” trajectory), the model (4) is observable, and one may be able to retrieve a unique point-estimate of the state using a filter, or observer.

While it is commonly accepted that the model (4) or its equivalent reduced realization, is observable, this is the case only when biases are exactly constant. But if biases are allowed to change, however slowly, the observability analysis conducted thus far cannot be used to conclude that the indistinguishable set is a singleton. Indeed, we show that this is the not the case, by computing the indistinguishable set explicitly. The following claim is proven in [6].

###### Claim 1 (Indistinguishable Trajectories).

The set of indistinguishable trajectories in the limit where is parametrized by an arbitrary and ,

(17) |

If we impose that , then is determined; similarly, if we impose the initial pose to be aligned with gravity (so gravity is in the form , then . But while we can impose this condition, we cannot enforce it, since the initial condition is not a part of the state of the filter, so we cannot relate the measurements at each time directly to it.

However, if the reference can be associated to constant parameters that are part of the state of the model, it can be enforced in a consistent manner. For instance, the ambiguous set of points is

(18) |

if each group contains at least non-coplanar points, it is possible to fix by parameterizing and imposing three directions , the measurement of these directions at time when they first appear. This yields and for that group. Note that it is necessary to impose this constraint in each group.

The residual set of indistinguishable trajectories is parameterized by constants , that determine a Gauge transformation for the groups, that can be fixed by always fixing the pose of one of the groups. This can be done in a number of ways. For instance, if for a certain group of points indexed by we impose

(19) |

by assigning their value to the current best estimate of pose and not including the corresponding variables in the state of the model, then we have that

(20) |

and therefore ; similarly,

(21) |

Therefore, the gauge transformation is enforced explicitly at each instant of time, as each measurement provides a constraint on the states. After the Gauge Transformation has been fixed, the model is observable in the limit , and otherwise the state of an observer is related to the true one as follows:

(22) | |||||

(23) | |||||

(24) | |||||

(25) | |||||

(26) | |||||

(27) | |||||

(28) | |||||

(29) |

where satisfy (13)-(16), and , are arbitrary. The groups will be defined up to an arbitrary reference frame , except for the reference group where that transformation is fixed. Note that, as the reference group “switches” (when points in the reference group become occluded or otherwise disappear due to failure in the data association mechanism), a small error in pose is accumulated. This error affects the gauge transformation, not the state of the system, and therefore is not reflected in the innovation, nor in the covariance of the state estimate, that remains bounded. This is unlike [11], where the covariance of the translation state and the rotation about gravity grows unbounded over time, possibly affecting the numerical aspects of the implementation. Notice that in the limit where , we obtain back Eq. (17). Otherwise, the equations above immediately imply the following

###### Claim 2 (unknown-input observability).

We refer the reader to [6] for proofs, which are articulated into several steps. In practice, once the Gauge transformations are fixed, a properly designed filter can be designed to converge to a point estimate, but there is no guarantee that such an estimate coincides with the true trajectory. Instead, the estimate can deviate from the true trajectory depending on the biases. The analysis above quantifies how far from the true trajectory the estimated one can be, provided that the estimation algorithm uses bounds on the bias drift rates and the characteristics of the motion. Often these bounds are not strictly enforced but rather modeled through the driving noise covariance.

## 3 Empirical validation

To validate the analysis, we run repeated trials to estimate the state of the platform under different motion but identical alignment (the camera is rigidly connected to the IMU and the connection is stable to high precision). If alignment parameters were identifiable (or the augmented state observable), we would expect convergence to the same parameters across all trials. Instead, Fig. 1 shows that the filter reaches steady-state, with the estimates of the parameters stabilizing, but to different values at each run.

Nevertheless, the parameter values are in a set, whose volume can be bounded based on the analysis above and the characteristics of the sensor. In particular, less stable biases, and less exciting motions, result in a larger indistinguishable set: Fig. 2 shows the same experiments with more gentle (hence less exciting) motions.

Fig. 3 shows the same where the accel and gyro biases have been artificially inflated by adding a slowly time-varying offset to the IMU measurements.

To further support the conclusions of the analysis, Monte-Carlo experiments were conducted on the model in simulation using stationary and time-varying biases while undergoing sufficiently exciting motion. For each trial, the platform views a consistent set of randomly generated points (no occlusions) while circling the point set on randomly generated trajectories. Figures 4 and 5 show the resulting estimation errors of the alignment states for 20 trials each using a constant and white-noise driven bias respectively. As seen in the experiments with real data, estimates in the time-varying bias scenario do not converge to a singleton.

The experiments thus confirm the analysis.

## 4 Discussion

We have shown that when inertial sensor biases are included as model parameters in the state of a filter used for navigation estimates, with bias rates treated as unknown inputs, the resulting model is not observable.

Consequently, we have re-formulated the problem of analyzing the convergence characteristics of (any) filters for vision-aided inertial navigation not as one of observability or identifiability, but one of sensitivity, by bounding the set of indistinguishable trajectories to a set whose volume depends on motion characteristics.

The advantage of this approach, compared to the standard observability analysis based on rank conditions, is that we characterize the indistinguishable set explicitly. Furthermore, rank conditions are “fragile” in the sense that the model can be nominally observable, and yet the condition number of the observability matrix be so small as to render the model effectively unobservable. We quantify the “degree of unobservability” as the sensitivity of the solution set to the input; provided that sufficient-excitation conditions are satisfied, the unobservable set can be bounded and effectively be treated as a singleton. More in general, however, the analysis provides an estimate of the uncertainty surrounding the solution set, as well as a guideline on how to limit it by enforcing certain gauge transformations.

## Acknowledgements

This work was supported by the Air Force Office of Scientific Research (grant no. AFOSR FA9550-12-1-0364) and the Office of Naval Research (grant no. ONR N00014-13-1-034).

## References

- [1] G. Basile and G. Marro. On the observability of linear, time-invariant systems with unknown inputs. Journal of Optimization theory and applications, 3(6):410–415, 1969.
- [2] S. Bezzaoucha, B. Marx, D. Maquin, J. Ragot, et al. On the unknown input observer design: a decoupling class approach with application to sensor fault diagnosis. In 1st International Conference on Automation and Mechatronics, CIAM’2011, 2011.
- [3] H. Dimassi, A. Loría, and S. Belghith. A robust adaptive observer for nonlinear systems with unknown inputs and disturbances. In Decision and Control (CDC), 2010 49th IEEE Conference on, pages 2602–2607. IEEE, 2010.
- [4] F Hamano and G Basile. Unknown-input present-state observability of discrete-time linear systems. Journal of Optimization Theory and Applications, 40(2):293–307, 1983.
- [5] H. Hammouri and Z. Tmar. Unknown input observer for state affine systems: A necessary and sufficient condition. Automatica, 46(2):271–278, 2010.
- [6] J. Hernandez and S. Soatto. Observability, identifiability, sensitivity, and model reduction for vision-assisted inertial navigation. UCLA CSD TR13022, http://arxiv.org/abs/1311.7434, Aug. 20, 2013 (revised Nov. 12, 2013; Nov 29. 2013).
- [7] E. S. Jones, A. Vedaldi, and S. Soatto. Inertial structure from motion and autocalibration. In Workshop on Dynamical Vision, October 2007.
- [8] J. Kelly and G. Sukhatme. Fast Relative Pose Calibration for Visual and Inertial Sensors. In Experimental Robotics, pages 515–524, 2009.
- [9] D. Liberzon, P. R. Kumar, A. Dominguez-Garcia, and S. Mitra. Invertibility and observability of switched systems with inputs and outputs. 2012.
- [10] Agostino Martinelli et al. Visual-inertial structure from motion: observability and resolvability. Foundations and Trends® in Computer Graphics and Vision, 1(1), 2014.
- [11] A. Mourikis and S. Roumeliotis. A multi-state constraint kalman filter for vision-aided inertial navigation. In Robotics and Automation, 2007 IEEE International Conference on, pages 3565–3572. IEEE, 2007.
- [12] R. M. Murray, Z. Li, and S. S. Sastry. A Mathematical Introduction to Robotic Manipulation. CRC Press, 1994.
- [13] S. Soatto. 3-d structure from visual motion: modeling, representation and observability. Automatica, 33:1287–1312, 1997.
- [14] Stefano Soatto. Observability/identifiability of rigid motion under perspective projection. In Decision and Control, 1994., Proceedings of the 33rd IEEE Conference on, volume 4, pages 3235–3240. IEEE, 1994.

## Appendix A Proofs

### a.1 Definitions

The mechanization equations can be written in compact notation as

(30) |

where is the state, the input, and the unknown input. Note that we are overloading the notation, by using to denote the unknown input in the compact notation, and the translational velocity in the original notation. This should not cause confusion as the two are never used in conjunction. Recall that is a collection of output measurements, and a state trajectory. In the absence of unknown inputs, , given output measurements and known inputs , we call

(31) |

the indistinguishable set, or set of indistinguishable trajectories, for a given input . If the initial condition equals the “true” one, the indistinguishable set contains at least one element, the “true” trajectory . However, if , the true trajectory may not even be part of this set.

If the indistinguishable set is a singleton (it contains only one element, , which is a function of the initial condition ), we say that the model is observable up to the initial condition, or simply observable.^{4}^{4}4We will assume that the solution of the differential equation is unique and continuously dependent on the initial condition, so if we impose , then . If is further independent of the initial condition, we say that the model is strongly observable:

If the state includes unknown parameters with a trivial dynamic, and there is no unknown input, , then observability of the resulting model implies that the parameters are identifiable. That usually requires the input to be sufficiently exciting (SE), in order to enable disambiguating the indistinguishable states,^{5}^{5}5Sufficient excitation means that the input is generic, and does not lie on a thin set. That is, even if we could find a particular input that yields indistinguishable states, there will be another input that is infinitesimally close to it that will disambiguate them. as the definition does not require that every input disambiguates any state trajectories.

In the presence of unknown inputs , consider the following definition

(32) |

which is the set of unknown-input indistinguishable states. The model is said to be unknown-input observable (up to initial conditions) if the unknown-input indistinguishable set is a singleton. If such a singleton is further independent of the initial conditions, the model is strongly observable. The two definitions coincide once the only admissible unknown input is for all .

It is possible for a model to be observable (the indistinguishable set is a singleton), but not unknown-input observable (the unknown-input indistinguishable set is dense). In that case, the notion of sensitivity arises naturally, as one would want to measure the “size” of the unknown-input indistinguishable set as a function of the “size” of the unknown input. For instance, it is possible that if the set of unknown inputs is small in some sense, the resulting set of indistinguishable states is also small. If and for any there exists a such that for some measure of volume implies for any , then we say that the model is bounded-unknown-input/bounded-output observable (up to the initial condition). If the latter volume is independent of we say that model is strongly bounded-unknown-input/bounded-output observable.

### a.2 Preliminary claims

###### Lemma 1.

Given and , and , the matrix is nonsingular unless , in which case it has rank or .

###### Proof.

The tangent has the form , where is some skew-symmetric matrix. As such, for any , so

The above is zero only if , so is nonsingular. For the remaining cases, observe that a skew-symmetric matrix has rank 2 or 0. ∎

###### Lemma 2.

Let and be differentiable trajectories in . For each time , there exists an open, full-measure subset such that:

For any two static point-clouds and that satisfy

(33) there exist constant scalings and a constant rotation such that

Furthermore, if , then for all .

###### Proof.

Write . Equality under the projection implies that there exists a scaling (possibly varying with and ) such that

(34) |

For a given time , we wish to find a suitably large set such that and is independent of , when Taking time derivatives,

or, dividing by ,

(35) |

Differentiating both sides with respect to ,

(36) |

Observe that and are scalars. By Lemma 1, the LHS has rank 2 or greater (as a linear map on ), unless . The right-hand side (RHS), however, has rank at most 1. Thus, (35) is invalid for almost all , unless (two maps of different ranks can only agree on a submanifold). Plugging into (36), we are left with

(37) |

Now, the LHS has rank 2 or 0, while the RHS has rank 1 or 0. Again, (35) is invalid for almost all , unless . Let be the open, full-measure subset (being the complement of two submanifolds) on which the latter must hold. If, in addition, , then and , we can finally write

∎

###### Claim 3 (Indistinguishable Trajectories from Bearing Data Sequences).

Let and be differentiable trajectories in . There exists an open, full-measure subset such that

Given two static, generic (non-coplanar) point clouds and , satisfying

there exist constant scalings and a constant transformation such that

(38) Furthermore, if has a non-constant translational component, then for all .

###### Proof.

Write and . Let , with defined as in Lemma 2. By Fubini’s theorem, this has full measure in . If , then the conditions for Lemma 2 are satisfied for almost all , and thus there exist constant (being stationary for almost all ) scalings and rotation such that .

Define , and observe that

If this affine relation holds for the generic set , then must be constant. Next,

Finally, if for some , then for all . ∎

In what follows, we will avoid the cumbersome discussion of sets such as , defined by a given trajectory, and will instead speak of sufficiently exciting trajectories, for which a given point cloud is suitable for tracking.

###### Definition 1 (Sufficiently Exciting Motion).

A trajectory is sufficiently exciting relative to a point-cloud if, for all and in ,

(39) | ||||

That is, if the projection map