# Online Learning of the Kalman Filter with Logarithmic Regret

## Abstract

In this paper, we consider the problem of predicting observations generated online by an unknown, partially observed linear system, which is driven by stochastic noise. For such systems the optimal predictor in the mean square sense is the celebrated Kalman filter, which can be explicitly computed when the system model is known. When the system model is unknown, we have to learn how to predict observations online based on finite data, suffering possibly a non-zero regret with respect to the Kalman filter’s prediction. We show that it is possible to achieve a regret of the order of with high probability, where is the number of observations collected. Our work is the first to provide logarithmic regret guarantees for the widely used Kalman filter. This is achieved using an online least-squares algorithm, which exploits the approximately linear relation between future observations and past observations. The regret analysis is based on the stability properties of the Kalman filter, recent statistical tools for finite sample analysis of system identification, and classical results for the analysis of least-squares algorithms for time series. Our regret analysis can also be applied for state prediction of the hidden state, in the case of unknown noise statistics but known state-space basis. A fundamental technical contribution is that our bounds hold even for the class of non-explosive systems, which includes the class of marginally stable systems, which was an open problem for the case of online prediction under stochastic noise.

## 1 Introduction

The celebrated Kalman filter has been a fundamental approach for estimation and prediction of time-series data, with diverse applications ranging from control systems and robotics (Bertsekas, 2017; Durrant-Whyte and Bailey, 2006) to computer vision (Coskun et al., 2017) and economics (Harvey, 1990). Given a known system model with known noise statistics, the Kalman filter predicts future observations of a partially observable dynamical process by filtering past observations. When the underlying process is linear and the noise is Gaussian, the Kalman filter is optimal in the sense that it minimizes the mean square prediction error. Since Kalman’s seminal paper (Kalman, 1960), the stability and statistical properties of the Kalman filter have been well studied when the system model is known.

Learning to predict unknown partially observed systems is a significantly more challenging problem. Even in the case of linear systems, learning directly the model parameters of the system results in nonlinear, non-convex problems (Yu et al., 2018). Adaptive filtering algorithms address the problem of making observation predictions when the system model or the noise statistics are unknown or changing (Ljung, 1978; Moore and Ledwich, 1979; Lai and Ying, 1991; Ding et al., 2006). These adaptive filtering approaches are usually based on variations of extended least squares. Despite the importance of adaptive filtering in applications such as GPS, the regret of online filtering algorithms has not been considered in this classical literature.

In this paper, we consider the problem of predicting observations generated by an unknown, partially observable linear dynamical system in state-space form. We assume that the system dynamics and observation map are corrupted by Gaussian noise. Our goal is to find an online prediction algorithm that has provable regret bounds with respect to the Kalman filter that has access to the full system model. Our technical contributions are:

System theoretic regret: We define a notion of regret that has a natural, system theoretic interpretation. The prediction error of an online prediction algorithm is compared against the prediction error of the Kalman filter that has access to the exact model, which is allowed to be arbitrary. Previous regret definitions (Kozdoba et al., 2019) required the model to lie in a finite set.

Logarithmic regret for the Kalman filter: We present the first online prediction algorithm with provable logarithmic regret upper bounds for the classical Kalman filter. In fact, we prove that with high probability the regret of our algoritm is of the order of , where hides terms, where is the number of observations collected. Our algorithm has polynomial time complexity, requires linear memory, and is based on subspace system identification techniques (Qin, 2006). Instead of optimizing over the state-space parameters, which is a non-convex problem, we convexify the problem by establishing an approximate regression between the next observation and past observations. Our analysis is based on the stability properties of the Kalman filter, tools for self-normalized martingales and matrices, and additional results for persistency of excitation developed in this paper.

Logarithmic regret for non-explosive systems: Our regret guarantees hold for the class of non-explosive systems, which includes marginally stable linear systems as well as as systems where the state can grow at a polynomial rate. This settles an open question and conludes that online prediction performance does not depend on the system stability gap^{1}

Regret analysis for other predictors: Our approach directly carries over to various interesting online predictors. For example, our analysis can be directly extended to the case of step ahead prediction of observations. Another extension focuses on the regret of hidden state predictors when the state-space basis representation is known a priori. The latter situation arises, for example, when the state-space model is known but the noise statistics are unknown. All these predictors enjoy similar logarithmic regret bounds.

Gap between model-free LQR and Kalman filter: One of the implications of our bounds is that learning to predict observations like the Kalman filter is provably easier than solving the online Linear Quadratic Regulator (LQR) problem, which in general requires regret. In fact, recent results suggest that in the LQR case, the regret is lower bounded by (Simchowitz and Foster, 2020). This might not be surprising due to the fact that, in the absence of exogenous inputs, we cannot inject exploratory signals into the system.

### 1.1 Related work

Recently, there have been very important results addressing the regret of the adaptive Linear Quadratic Regulator (LQR) problem (Abbasi-Yadkori and Szepesvári, 2011; Faradonbeh et al., 2017; Ouyang et al., 2017; Abeille and Lazaric, 2018; Dean et al., 2018; Mania et al., 2019; Cohen et al., 2019). The best regret for LQR is sublinear and of the order of , where is the numbers of state samples collected; an in-depth survey can be found in Matni et al. (2019). When the system model is known, then the Kalman filter is the dual of the Linear Quadratic Regulator, suggesting that this duality can be exploited in deriving the regret of the Kalman filter. However, when the system model is unknown, the Linear Quadratic Regular and the Kalman filter are not dual problems (Tsiamis et al., 2019). As the state is fully observed in LQR, the system identification in adaptive LQR reduces to a simple least squares problem. In the adaptive Kalman filter, the state is partially observed resulting in non-convex system identification problems requiring us to consider a different approach.

A related but different problem focuses on online prediction algorithms for systems without internal states (such as ARMA - autoregressive moving average) (Anava et al., 2013). Prediction of observations generated by state space models in the case of exogenous inputs and adversarial noise but with a bounded budget was studied in Hazan et al. (2018). The work closest to ours is the very recent work of Kozdoba et al. (2019), where regret bounds with respect to the Kalman Filter was studied for the first time but in the restricted context of scalar and bounded observations. The regret is shown to be linear, where the linear term is small but nonzero.

Our online algorithm is inspired by subspace identification techniques (Bauer et al., 1999). The technical approach is based on classical results for the analysis of the least-squares estimator for time series (Lai and Wei, 1982), high-dimensional statistics (Vershynin, 2018) as well as modern results for finite sample analysis of system identification in both the fully observed (Faradonbeh et al., 2018; Simchowitz et al., 2018; Sarkar and Rakhlin, 2018) and the partially observed case (Hardt et al., 2018; Oymak and Ozay, 2018; Simchowitz et al., 2019; Sarkar et al., 2019; Tsiamis and Pappas, 2019).

Paper organization. In Section 2 we provide some background on the classical Kalman Filter and formulate the regret problem considered in this paper. In Section 3 we introduce the online learning algorithm while the regret guarantees are presented in Section 4. We conclude with generalizations and discussion of future work in Sections 5, 6. Detailed proofs can be found in the Appendix.

Notation. With we denote the Euclidean norm for vectors and the spectral norm for matrices. The spectral radius of a matrix is denoted by .
The smallest singular value of a matrix is denoted by . By we denote the transpose of . Unless explicitly stated, when using the standard notation we hide all other quantities, e.g. system constants, system dimensions, logarithms of failure probabilities. The notation hides (powers of) logarithmic terms of . The notation means a polynomial function of .

## 2 Problem Formulation

The Kalman filter considers the problem of predicting observations generated by the following state-space system:

(1) | ||||||

where is the state, are the observations (outputs), is the system matrix and is the observation matrix. The time series represent the process and measurement noise respectively and are modeled as zero mean i.i.d. Gaussian variables, independent of each other, with covariances and respectively. The initial state is zero mean Gaussian with covariance and independent of the noises. The following assumption holds throughout this paper.

###### Assumption 1.

System (1) is non-explosive^{2}

Let be the filtration generated by the observations . Given the observations up to time , the optimal prediction at time in the minimum mean square error (mmse) sense is defined as:

(2) |

In the case of system (1), the optimal predictor admits a recursive expression, known as the Kalman filter:

(3) | ||||

where is the innovation noise process. Matrix is called the Kalman filter gain, and can be computed based on –see (7) in Subection 2.1.

Although the Kalman filter gives the optimal mmse prediction, it requires the system matrices and noise covariances to be known. In this paper, we seek online learning algorithms that can predict observations based only on past observation data, without any knowledge of system matrices of noise covariances. To quantify the online prediction performance, we define the regret of our online learning algorithm with respect to the Kalman filter (3) that has full knowledge of system model (1). Our goal is to achieve sublinear regret, as defined in the following problem statement.

###### Problem 1.

Assume that in system model (1) are unknown. Consider a sequence of observations generated by system (1). Let be the prediction of an online learning algorithm based on the history and be the Kalman filter prediction (3) that has full knowledge of model (1). Define the regret:

(4) |

Fix a failure probability . Our goal is to find a learning algorithm such that with probability at least :

where does not depend on .

Our regret definition has a natural system theoretic interpretation since it is defined with respect to the Kalman filter. In Section 5, we discuss an alternative regret definition.

In the following subsection we provide some background on the Kalman filter and specify some standard assumptions, which guarantee that the Kalman filter is well-defined.

### 2.1 Kalman Filter Background

The Kalman filter enjoys two critical properties, namely closed-loop stability and innovation orthogonality, that are now reviewed. The following standard assumption holds throughout the paper and guarantees that the Kalman filter is well-defined.

###### Assumption 2.

The system matrix pair is observable, i.e. the observability matrix:

(5) |

has rank for all . The pair is controllable, i.e. the controllability matrix

(6) |

has rank for all , and is strictly positive definite.

The following result shows that under Assumption 2, the closed loop matrix of the Kalman filter is stable.

###### Proposition 1 (Anderson and Moore 2005).

Proposition 1 implies that the Kalman filter reaches steady state exponentially fast, allowing us to assume the following.

###### Assumption 3.

We assume that the initial state covariance is , where is defined in (7).

If , then we have to consider time-varying gains in (3). The condition guarantees that the Kalman filter (3) has stabilized to its steady-state so that the gain is constant. Since the Kalman filter converges exponentially fast to its steady-state Anderson and Moore (2005), this is a very mild assumption; it is also standard Knudsen (2001).

The next assumption makes sure that system (3) is minimal.

###### Assumption 4.

The pair is controllable.

If the pair is not controllable, then we can find a similarity transformation such that:

But since this implies that for all . Hence we could remove and consider a reduced system representation with only .

The following assumption is for notational simplicity. It assumes that the largest eigenvalue of is simple.

###### Assumption 5.

For some and all , the closed-loop matrix satisfies .

If the largest eigenvalue has larger multiplicity then we can just consider in the above bound, for sufficiently small .

In addition to the previous stability properties , the other nice property of the Kalman Filter is that the innovation sequence is orthogonal (uncorrelated) and, by Gaussianity, also i.i.d. By the law of large numbers, this implies that the accumulative error will be of the order of almost surely. Predicting the true observations exactly is impossible in the stochastic noise setting, even if we know the system model.

Note that both systems (1), (3) can generate the same observations , i.e. the noise parameterization is not unique Van Overschee and De Moor (2012). Another source of ill-posedness is that the state space parameterization is non-unique. Any similarity transformation , , generates the same observations. In the following section, we will address these problems by considering an alternative system representation.

## 3 Online Prediction Algorithm

The main idea of our online prediction algorithm is based on a system representation that has been used in the subspace system identification Bauer et al. (1999). Let be an integer that represents how far we look into the past. We define the vector of past observations at time :

(8) |

Define also the matrix of closed-loop responses

(9) |

By expanding the Kalman filter (3) -steps into the past, the observation at time can be rewritten as

(10) |

Instead of optimizing over system parameters , which results in a non-convex optimization problem, we optimize over (the higher dimensional) , which makes the problem convex. From an online learning perspective, this technique is also known as improper learning. Using this lifting, we can learn a least squares estimate by regressing outputs to past outputs for :

(11) |

where is a regularization parameter. Then, to predict the next observation, we can compute:

(12) |

We could also use the recursive update:

as long as the past is kept constant.

Due to the stability properties of the Kalman filter (Section 2.1), if we consider past observations, then the bias term in equation (10) is of the order of . Notice that for non-explosive systems the state can grow polynomially fast in the worst case. Even if remains bounded, keeping the past constant would lead to a non-vanishing bias error (linear regret). Thus, to make sure that the prediction error decreases, we need to gradually increase the past horizon . For this reason, inspired by the “doubling trick” Cesa-Bianchi and Lugosi (2006), we divide the learning in epochs, where every epoch is twice longer than the previous one. During every epoch we keep the past horizon constant. Since is exponentially decreasing, it is sufficient to slowly increase the past as , where is the epoch duration.

The pseudo-code of our online prediction approach can be found in Algorithm 1. Each epoch lasts from time , where is the epoch, , and is a design parameter (the length of the first epoch). During every epoch, we keep the past constant, where is a design parameter. Initially, from time to , we have a warm-up phase where we gather enough observations to start predicting. To make sure that , we tune accordingly. Within an epoch, the least squares based predictor (12) can be implemented in a recursive way, which requires polynomial complexity and at most memory. In the beginning of an epoch, when is updated, we re-initialize the recursive predictor based on the whole past, which requires polynomial complexity and memory. Hence, in total, after collected samples, the computational complexity is polynomial and the memory requirement is . In Section 6, we discuss ways to modify the initialization when changing epochs without using the whole past, which can reduce the memory to .

An important property of Algorithm 1 is that no knowledge about the dynamics or even the state dimension is required. Note that there is a tradeoff between the bias error and statistical efficiency. Increasing the past horizon by selecting larger leads to smaller bias error, but increases the sample complexity of learning since we have more unknowns; it is also harder to achieve persistency of excitation, i.e. to have a large enough smallest singular value of .

## 4 Regret Analysis

In this section, we prove that with high probability the prediction regret is not only sublinear, but also of the order of (or ), where is the number of observations collected so far. The challenge in the non-explosive regime is that the observations grow unbounded polynomially fast (). Meanwhile, recent work in finite sample analysis of system identification Oymak and Ozay (2018); Simchowitz et al. (2019); Tsiamis and Pappas (2019); Sarkar et al. (2019) shows that the model parameters can be learned at a slower rate (). Therefore these system identification results cannot be directly applied to obtain regret bounds for our problem. Nonetheless, we show that our online Algorithm 1 mitigates the effect of unbounded observations. As a result, the logarithmic regret bound of remains valid even as we approach instability.

We provide two results, one for non-explosive systems () and one for stable systems . Before we present the regret results, let us introduce some standard notions. Let be the minimal polynomial of matrix , i.e. the minimum degree polynomial such that . Denote its degree by . We define the norm of its coefficients as ; the norm is defined in a similar way. Let be the dimension of the largest Jordan block of that is a associated with an eigenvalue on the unit circle (i.e. ). Let be the largest Jordan block among all eigenvalues. In general, .

###### Theorem 1 (Regret for non-explosive systems).

Consider system (3) with . Let be sequence of system observations with being the respective Kalman filter predictions. Let be the predictions of Algorithm 1 with

(13) |

and fix a failure probability . There exists a , independent of , such that with probability at least , if then:

(14) |

where hides and terms.

Theorem 1 provides the first logarithmic regret upper bounds for the general problem of Kalman filter prediction. The burn-in time is related to persistency of excitation conditions, i.e. initially we need enough samples to guarantee that the smallest singular value of the Gram matrix increases linearly with . Our bounds do not depend on the stability gap and they do not degrade as we approach instability. However, they suggest, via , that the stability radius of the Kalman filter closed-loop matrix affects the difficulty of learning.

Interestingly, our bounds show that the problem of learning to predict observations like the Kalman filter is provably easier than the online LQR, in the case of unknown model. The latter requires in general regret of the order of (Simchowitz and Foster, 2020). This is another reason why the problems are not dual in the unknown model case. This gap might be expected since in the case of Kalman filter without exogenous inputs, there is no exploratory signal.

The upper bound also depends on the quantities and , both of which can be exponential in the dimension of the system state in the worst case. This can happen, for example, if , i.e. the system is an th order integrator. Dependence of learning performance on the coefficients of the characteristic or minimal polynomial has been found in related work (Hardt et al., 2018). This dependence can be improved in some cases–see for example the phase polynomial in Hazan et al. (2018), where there are no repeated eigenvalues. In our case, this dependence could perhaps be improved by applying the techniques of Simchowitz et al. (2019). However, it is an open question whether it is possible to avoid the exponential dependence on , . It might be possible that systems with long chain structure, e.g. integrators, are indeed harder to learn. In system theory it is known that even in the known model case, such systems might be difficult to observe. In open-loop system identification (Simchowitz et al., 2019), such a dependence also appears. It might be an inherent limitation of the problem, since fundamental quantities of the system, for example matrix or the observability matrix scale with .

Both of the above issues are avoided in the case of stable systems (), where we have the following result.

###### Theorem 2 (Regret for stable systems).

Notice that for stable systems we no longer have quantities that depend exponentially on the dimension . The main bound (15) does not depend on the stability gap However, via , the guarantees depend logarithmically on the inverse radius . This quantity is related to the time needed for a stable system to approach stationarity.

The proofs of Theorem 1 and Theorem 2 can be found in the Appendix. In the next subsection, we provide an overview of the regret analysis and explain why the quantities and appear in the bound in Theorem 1. We also explain what changes in the case of stable systems addressed by Theorem 2.

### 4.1 Regret analysis overview

Recall the definition of the innovation error . For brevity, we also define the error between the online prediction of Algorithm 1 and the Kalman Filter prediction. By adding and subtracting in the first term, we obtain

It is sufficient to prove that the square loss :

(16) |

is logarithmic in . Because the innovations are i.i.d., we have a martingale structure for the second term since , while . The martingale term will in general be small and can be bounded in terms of the square loss . In particular, the quantity

is a self-normalized martingale and can be analyzed based on the techniques of Abbasi-Yadkori et al. (2011); Sarkar and Rakhlin (2018), which imply that

with high probability. Hence, we will focus on bounding the square loss .

For the remaining section, we will assume that we are within one epoch so that the past horizon and are kept constant. For brevity, we omit the subscript from all variables and write instead of .

Define and . Then, the error between our online prediction and the Kalman filter prediction can be written as:

(17) | ||||

The regression term is due to the noise entering the system. The truncation bias is due to using only past observations and not all of them.

The key ingredients to analyze the cumulative error are i) the stability properties of the closed-loop matrix ; ii) self-normalization properties of predictor (12); and iii) persistency of excitation for the past observations with high probability. By persistency of excitation we mean that the least singular value of the Gram matrix is increasing as fast as with high probability.

Regression term. We can rewrite the regression term as a product of two separate terms:

The first term, is again a self-normalized martingale and can be analyzed based on the techniques of Abbasi-Yadkori et al. (2011); Sarkar and Rakhlin (2018), which imply that the term grows logarithmically with . The martingale property again comes from the fact that the innovation process of the Kalman Filter is i.i.d.–see Section 2.1.

The second term, , is almost self-normalized since is the Gram matrix of . It could be bounded using the following lemma which is inspired by Lai and Wei (1982).

###### Lemma 1.

Let . Then, the following inequality holds:

The intuition is that appears in and, hence, it cancels out the effect of . Unfortunately, we cannot directly use the above inequality for since is not explicitly contained in . However, we can exploit the fact that does not change too fast compared to the most recent past , , .

###### Lemma 2 (ARMA-like representation).

Let be observations generated by system (1). Fix a past horizon and let be the minimal polynomial of with degree . Then, the past observations satisfy the following recursion

(18) |

where with

(19) |

where

Intuitively, the unbounded components of are captured by the recent history , , and the residual is bounded. Replacing with (18) we obtain by two Cauchy-Schwarz inequality applications:

The terms in the sum are now indeed normalized and can be bounded using Lemma 1. For we exploit a new persistency of excitation result.

###### Lemma 3 (Uniform Persistency of Excitation).

Consider the conditions of Theorem 1. Select a failure probability . Let for some fixed epoch with the corresponding past horizon. There exists a such that if , then with probability at least :

(20) |

uniformly for all .

The above persistency of excitation condition holds uniformly over all times as long as . This is why the burn-in time appears in Theorem 1; if is very small, then matrix is not even invertible. A similar persistency of excitation result was proved in Tsiamis and Pappas (2019) for a fixed time . However, the result of Lemma 3 is more general since it holds for all .

Regularization and Truncation terms. For the regularization term we follow the same steps as with the regression one. Since matrix is stable, the truncation term decreases exponentially fast with . System quantity governs how fast the observations grow polynomially. Parameter should be large enough cancel out this polynomial rate. This explains why affects the choice of in (13).

Stable Systems. If , then we can exploit the fact that converges exponentially fast to a stationary distribution. Hence the term will effectively be self-normalized, without the need to express as a function of the past observations. In particular, for stable systems we prove a new and stronger persistency of excitation result. Denote:

Then, we have the following.

###### Lemma 4 (Uniform Persistency of Excitation: Stable case).

Consider the conditions of Theorem 2. Select a failure probability . Let for some fixed epoch with the corresponding past horizon. There exists a such that if , with probability at least :

(21) |

uniformly for all .

Hence, the term can be bounded by:

where now the normalized term behaves like a standard isotropic Gaussian variable. The term is due to the fact that it takes time for the state to approach the stationary distribution (mixing time).

## 5 Extensions

In this section, we discuss generalizations of Algorithm 1 and the regret analysis.

#### Alternative regret definition

Denote the system responses by , for . Let be the sequence of system responses. Then, a parameterization for online prediction could be

Let be the set of system responses which decay exponentially for some and , which are larger than in Assumption 5. This set can include for example, stable IIR filters or FIR filters. Then, an alternative regret definition would be:

(22) |

The above definition captures the one in Kozdoba et al. (2019), where the unknown system lies in a finite set.

Since the observations increase at most polynomially fast and due to the properties of the Kalman filter, we can show that the difference depends on logarithmic terms of . Hence our definition (4), which does not require any model restriction is general. The details can be found in the Appendix.

#### -steps ahead predictor

An immediate generalization of Algorithm 1 is to consider the steps ahead predictor, where is some future horizon. Instead of predicting only the next observation, we predict the sequence , , . Denote the future observations and noises by:

Similar to (8), we can establish a regression:

where and is a lower triangular block Toeplitz matrix generated by . The optimal Kalman filter predictor in this case is

Hence, the regret can be defined as in (4), with the lowercase replaced with uppercase . The online predictor (12) can be adapted here:

where is obtained similar to (11) by regressing future observations to past observations from time up to . The logarithmic regret guarantees of also hold with the final bound depending polynomially on and .

#### State prediction

If we have some knowledge about the state, e.g. the state space basis and the state space dimension , then we can use the step ahead predictor to predict the hidden state . Notice that the Kalman filter state prediction can be rewritten as:

If we know and the future horizon is large enough we can compute the state prediction:

where is our step ahead prediction and denotes the pseudo-inverse. In this case the regret:

(23) |

will enjoy the same logarithmic guarantees. Hence, our algorithm can be used to solve the adaptive Kalman filter problem posed in Mehra (1970); Anderson and Moore (2005), where the dynamics are known but the noise statistics are unknown, with logarithmic regret.

If we do not know , then we could estimate the range space of by performing singular value decomposition on . However, there are infinite representations , for any invertible , all of which can explain the same observations. The definition (23) is ill-posed since and might be based on different transformations . Finding an alternative regret definition is subject of future work.

#### Logarithmic memory

It is possible to achieve the logarithmic regret guarantees with logarithmic memory, by modifying the initialization step in the beginning of every epoch in Algorithm 1. For stable systems, we could just reset and to zero and respectively. This might not work for non-explosive systems, since can be polynomially large in . In this case, based on the regret analysis, we could initialize with the recent history and , where is an upper bound for the degree of the minimal polynomial . This gives us control over –see Section 4, and requires only memory.

## 6 Conclusion and Future Work

In this paper, we provided the first logarithmic regret upper bounds for learning the classical Kalman filter of an unknown system with unknown stochastic noise. Our regret analysis holds for non-explosive systems and our bounds do not degrade with the system stability gap.

Going forward, our paper opens up several research directions. An open question that is whether we can define an appropriate regret notion in the case of state prediction, when matrices are unknown, and prove logarithmic bounds. Another interesting direction is to study how the learning performance is affected by system theoretic properties, such as the exponential quantity in the case of systems with long chain structure, e.g. -order integrators. Analyzing the regret of other online algorithms, e.g. extended least squares, is also an open problem. Another challenging problem for both prediction and system identification is the case of explosive systems. Although in the fully observed case, this problem has been studied (Faradonbeh et al., 2018; Sarkar and Rakhlin, 2018), it remains open in the case of partially observable systems. Finally, in this work we considered that the state is only driven by stochastic noise. A more general problem to study is when we also have exogenous inputs. One of the challenges is that it is harder to prove persistency of excitation in the case of closed-loop systems.

## Acknowledgments

The authors would like to thank Nikolai Matni for useful discussions.

lemmasection \counterwithintheoremsection \counterwithinpropositionsection \counterwithincorollarysection \counterwithindefinitionsection \counterwithinequationsection

###### Contents

- 1 Introduction
- 2 Problem Formulation
- 3 Online Prediction Algorithm
- 4 Regret Analysis
- 5 Extensions
- 6 Conclusion and Future Work
- Appendix
- A Notation and organization of the Appendix
- B Linear Systems Theory
- C Statistical Toolbox
- D PAC bounds and persistency of excitation for fixed-time and fixed-past
- E Proof of Lemma 3
- F Proof of Lemma 1
- G Analysis within one epoch
- H Proof of Theorem 1
- I Stable case
- J Alternative regret definition
- K Technical lemmas

## Appendix

## Appendix A Notation and organization of the Appendix

#### Structure.

In the first Sections B, C we review results from system theory and statistics. These include the main tools with which we will prove Theorems 1, 2. In Section D, we provide PAC bounds and persistency of excitation (PE) results for a fixed time (pointwise) and fixed past horizon . In Section E, we generalize those PAC bounds and PE results from pointwise to uniform over all times in one epoch. In Section F we prove Lemma 1. By combining the uniform bounds and Lemma 1, we prove in Section G that the square loss suffered within one epoch is logarithmic with respect the length of the epoch. Hence, we can now prove Theorem 1–see Section H. In Section I, we analyze the case of stable systems and prove Theorem 2. Finally, in Section J, we show how the alternative regret definition (22) is equivalent to ours (4) up to logarithmic terms. Section K includes some technical results about logarithmic inequalities, which are used to show that the burn-in time depends polynomially on the various system parameters.

#### Notation.

Before we proceed with the regret analysis, let us introduce some notation. A summary can be found in Table 1 We will analyze the performance of Algorithm 1 based mainly on a fixed epoch . Since the past horizon is kept constant during an epoch, we will drop the index from , , , and write , , , instead. Similar to the past outputs , we also define the past noises:

(24) |

The batch past outputs, batch past noises, and batch past Kalman filter states are defined as:

(25) |

This notation will simplify the sums , etc.

Recall the definition of the correlations between the current innovation and the past outputs and the regularized autocorrelations of past outputs . The innovation sequence is i.i.d. zero mean Gaussian. Its covariance has a closed-form expression and is defined as:

(26) |

where is the solution to the Riccati equation (7). Next we define the Toeplitz matrix , for some :

(27) |

A useful property of system (3) is that the past outputs can be written as:

(28) |

The covariance of is denoted by:

(29) |

We define the covariance of the state predictions:

(30) |

and the covariance of the past outputs:

(31) |

Finally, let be the Jordan form of . With the big O notation we also hide parameters like , etc.

Past outputs at time | ||

Past noises at time | ||