A Tutorial on Concentration Bounds for System Identification
We provide a brief tutorial on the use of concentration inequalities as they apply to system identification of state-space parameters of linear time invariant systems, with a focus on the fully observed setting. We draw upon tools from the theories of large-deviations and self-normalized martingales, and provide both data-dependent and independent bounds on the learning rate.
A key feature in modern reinforcement learning is the ability to provide high-probability guarantees on the finite-data/time behavior of an algorithm acting on a system. The enabling technical tools used in providing such guarantees are concentration of measure results, which should be interpreted as quantitative versions of the strong law of large numbers. This paper provides a brief introduction to such tools, as motivated by the identification of linear-time-invariant (LTI) systems.
In particular, we focus on the identifying the parameters of the LTI system
assuming perfect state measurements. This is in some sense the simplest possible system identification problem, making it the perfect case study for such a tutorial. Our companion paper [extended] shows how the results derived in this paper can then be integrated into self-tuning and adaptive control policies with finite-data guarantees. We also refer the reader to Section II of [extended] for an in-depth and comprehensive literature review of classical and contemporary results in system identification. Finally, we note that most of the results we present below are not the sharpest available in the literature, but are rather chosen for the pedagogical value.
The paper is structured as follows: in Section II, we study the simplified setting when system (1) is defined for a scalar state , and data is drawn from independent experiments. Section LABEL:sec:vector extends these ideas to the vector valued settings. In Section LABEL:sec:single we study the performance of an estimator using all data from a single trajectory – this is significantly more challenging as all covariates are strongly correlated. Finally, in Section LABEL:sec:data-dependent, we provide data-dependent bounds that can be used in practical algorithms.
Ii Scalar Random Variables
Consider the scalar dynamical system
for , and an unknown parameter. Our goal is to estimate , and to do so we inject excitatory Gaussian noise via . We run experiments over a horizon of time-steps, and then solve for our estimate via the least-squares problem