Concurrent learning for parameter estimation using dynamic state-derivative estimatorsRushikesh Kamalapurkar and Warren E. Dixon are with the Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA. Email: {rkamalapurkar, wdixon}@ufl.edu. Benjamin Reish and Girish Chowdhary are with the Department of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK, USA. Email: reish@ostatemail.okstate.edu and girish.chowdhary@okstate.edu

# Concurrent learning for parameter estimation using dynamic state-derivative estimators††thanks: Rushikesh Kamalapurkar and Warren E. Dixon are with the Department of Mechanical and Aerospace Engineering, University of Florida, Gainesville, FL, USA. Email: {rkamalapurkar, wdixon}@ufl.edu. Benjamin Reish and Girish Chowdhary are with the Department of Mechanical and Aerospace Engineering, Oklahoma State University, Stillwater, OK, USA. Email: reish@ostatemail.okstate.edu and girish.chowdhary@okstate.edu

Rushikesh Kamalapurkar, Ben Reish, Girish Chowdhary, and Warren E. Dixon
###### Abstract

A concurrent learning (CL)-based parameter estimator is developed to identify the unknown parameters in a linearly parameterized uncertain control-affine nonlinear system. Unlike state-of-the-art CL techniques that assume knowledge of the state-derivative or rely on numerical smoothing, CL is implemented using a dynamic state-derivative estimator. A novel purging algorithm is introduced to discard possibly erroneous data recorded during the transient phase for concurrent learning. Since purging results in a discontinuous parameter adaptation law, the closed-loop error system is modeled as a switched system. Asymptotic convergence of the error states to the origin is established under a persistent excitation condition, and the error states are shown to be ultimately bounded under a finite excitation condition. Simulation results are provided to demonstrate the effectiveness of the developed parameter estimator.

## I Introduction

Modeling and identification of input-output relationships of nonlinear dynamical systems has been a long-standing active area of research. A variety of offline techniques have been developed for system identification; however, when models are used for online feedback control, the ability to adapt to changing environment and the ability to learn from input-output data are desirable. Motivated by applications in feedback control, online system identification techniques are investigated in results such as [1, 2, 3, 4] and the references therein.

Parametric methods such as linear parameterization, neural networks and fuzzy logic systems approximate the system identification problem by a finite-dimensional parameter estimation problem, and hence, are popular tools for online nonlinear system identification. Parametric models have been widely employed for adaptive control of nonlinear systems. In general, adaptive control methods do not require or guarantee convergence of the parameters to their true values. However, parameter convergence has been shown to improve robustness and transient performance of adaptive controllers (cf. [5, 6, 7, 8]). Parametric models have also been employed in optimal control techniques such as model-based predictive control (MPC) (cf. [9, 10, 11, 12]) and model-based reinforcement learning (MBRL) (cf. [13, 14, 15, 16]). In MPC and MBRL, the controller is developed based on the parameter estimates; hence, stability of the closed-loop system and the performance of the developed controller critically depend on convergence of the parameter estimates to their ideal values.

Data-driven concurrent learning (CL) techniques are developed in results such as [17, 8, 18], where recorded data is concurrently used with online data to achieve parameter convergence under a relaxed finite excitation condition as opposed to the persistent excitation (PE) condition required by traditional adaptive control methods. CL techniques employ the fact that a direct formulation of the parameter estimation error can be obtained provided the state-derivative is known or its estimate is otherwise available through techniques such as fixed-point smoothing [19]. The parameter estimation error can then be used in a gradient-based adaptation algorithm to drive the parameter estimates to their ideal values. If exact derivatives are not available, the parameter estimation error can be shown to decay to a neighborhood of the origin provided accurate estimates of the state-derivatives are available, where the size of the neighborhood depends on the derivative estimation error [19]. Experimental results such as [8] demonstrate that, since derivatives at past data points are required, noncausal numerical smoothing techniques can be used to generate satisfactory estimates of state-derivatives. Under Gaussian noise, smoothing is guaranteed to result in the best possible linear estimate corresponding to the available data [20, Section 5.3]; however, in general, the derivative estimation error resulting from numerical smoothing can not be quantified a priori. Furthermore, numerical smoothing requires additional processing and storage of data over a time-window that contains the point of interest. Hence, the problem of achieving parameter convergence under relaxed excitation conditions without using numerical differentiation is also motivated.

In this paper, an observer is employed to estimate the state-derivative. The derivative estimate generated by the observer converges exponentially to a neighborhood of the actual state-derivative. However, in the transient phase, the derivative estimation errors can be large. Since CL relies on repeated use recorded data, large transient errors present a challenge in the development of a CL-based parameter estimator. If the derivative estimation errors at the points recorded in the history stack are large, then the corresponding errors in the parameter estimates will be large. Motivated by the results in [21], the aforementioned challenge is addressed in this paper by designing a novel purging algorithm to purge possibly erroneous data from the history stack. The closed-loop error system along with the purging algorithm is modeled as a switched nonlinear dynamical system. Provided enough data can be recorded to populate the history stack after each purge, the developed method ensures asymptotic convergence of the error states to the origin.

The PE condition can be shown to be sufficient to ensure that enough data can be recorded to populate the history stack after each purge. Since PE can be an impractical requirement in many applications, this paper examines the behavior of the switched error system under a relaxed finite excitation condition. Specifically, provided the system states are exciting over a sufficiently long finite time-interval, the error states decay to an ultimate bound. Furthermore, the ultimate bound can be made arbitrarily small by increasing the learning gains. Simulation results are provided to demonstrate the effectiveness of the developed method under measurement noise.

## Ii System dynamics

The system dynamics are assumed to be nonlinear, uncertain, and control-affine, described by the differential equation

 ˙x=f(x,u), (1)

where the function is locally Lipschitz. It is assumed that the dynamics can be split into a known component and an uncertain component with parametric uncertainties that are linear in the parameters. That is, , where , and are known and locally Lipschitz continuous. The constant parameter vector is unknown, with a known bound such that . The objective is to design a parameter estimator to estimate the unknown parameters. The system input is assumed to be a stabilizing controller such that 111The focus of this paper is adaptive estimation, and not control design. Even though most adaptive controllers are designed based on an estimate of the unknown parameters, parameter estimation can often be decoupled from control design. For example, the adaptive controller in [22] can guarantee for a wide class of adaptive update laws. Under the additional assumption that , the developed technique can be extended to include linearly parameterized nonaffine systems, that is, . The system state is assumed to be available for feedback, and the state-derivative is assumed to be unknown.

## Iii CL-based adaptive derivative estimation

Let and denote estimates of the state , and the state-derivative , respectively. Let denote an estimate of the unknown vector . To achieve convergence of the estimate to the ideal parameter vector a CL-based parameter estimator is designed. The motivation behind CL is to adjust the parameter estimates based on an estimate of the parameter estimation error defined as , in addition to the state estimation error Since is not directly measurable, the subsequent development exploits the fact that the term can be computed as , provided measurements of the state-derivative are available. In CL results such as [17, 8, 18, 23], it is assumed that the state-derivatives can be computed with sufficient accuracy at a past time instance by numerically differentiating the recorded data. An approximation of the parameters estimation errors is then computed as , where denotes the system state at a past time instance , denotes the numerically computed state-derivative at , and is a constant of the order of the error between and . While the results in [19] establish that, provided is bounded, the parameter estimation error can be shown to decay to a ball around the origin, the focus is on the analysis of the effects of the differentiation error, and not on development of algorithms to reduce the parameter estimation error.

In this paper, a dynamically generated estimate of the state-derivative is used instead of numerical smoothing. The parameter estimation error is computed at a past recorded data point as , where . To facilitate the design, let be a history stack containing recorded values of the state, the control, and the state-derivative estimate. Each tuple is referred to as a data-point in . A history stack is called “full rank” if the state vectors recorded in satisfy . Based on the subsequent Lyapunov-based stability analysis, the history stack is used to update the estimate using the following update law:

 ˙^θ=kΓM∑j=1YT(xj)(˙^xj−f1(xj)−g(xj)uj−Y(xj)^θ)+ΓYT(x)~x, (2)

where , and are positive and constant learning gains.

The update law in (2) drives the parameter estimation error to a ball around the origin, the size of which is of the order of Hence, to achieve a lower parameter estimation error, it is desirable to drive to the origin. Based on the subsequent Lyapunov-based stability analysis, the following adaptive estimator is designed to generate the state-derivative estimates:

 ˙^x ˙μ =(k1α1+1)~x, (3)

where is an auxiliary signal and and are positive constant learning gains.

## Iv Algorithm to record the history stack

### Iv-a Purging of history stacks

The state-derivative estimator in (3) relies on feedback of the state estimation error In general, feedback results in large transient estimation errors. Hence, the state-derivative estimation errors associated with the tuples recorded in the transient phase can be large. The results in [19] imply that the parameter estimation errors can be of the order of Hence, if a history stack containing data-points with large derivative estimation errors is used for CL, then the parameter estimates will converge but the resulting parameter estimation errors can be large. To mitigate the aforementioned problem, this paper introduces a new algorithm that purges the erroneous data in the history stack as soon as more data is available. Since the estimator in (3) results in exponential convergence of to a neighborhood of newer data is guaranteed to represent the system better than older data, resulting in a lower steady-state parameter estimation error. The following section details the proposed algorithm.

### Iv-B Algorithm to record the history stack

The history stack is initialized arbitrarily to be full rank. An arbitrary full rank initialization of results in a modification (cf. [24]) like adaptive update law that keeps the parameter estimation errors bounded. However, since the history stack may contain erroneous data, the parameters may not converge to their ideal values.

In the following, a novel algorithm is developed to keep the history stack current by purging the existing (and possibly erroneous) data and replacing it with current data. The data collected from the system is recorded in an auxiliary history stack . The history stack is initialized such that and is populated using a singular value maximization algorithm [17]. Once the history stack becomes full rank with a minimum singular value that is above a (static or dynamic) threshold, is replaced with , and is purged.222Techniques such as probabilistic confidence checks (cf. [21]) can also be utilized to initiate purging. The following analysis is agnostic with respect to the trigger used for purging provided is full rank at the time of purging and the dwell time is maintained between two successive purges. In this paper, a dynamic threshold is used which is set to be a fraction of the highest encountered minimum singular value corresponding to up to the current time.

In the subsequent Algorithm 1, a piece-wise constant function , initialized to zero, stores the last time instance when was updated and a piecewise constant function stores the highest encountered value of up to time , where denotes the minimum singular value. The constant denotes the threshold fraction used to purge the history stack, and is an adjustable positive constant.

The following analysis establishes that if the system states are persistently exciting (in a sense that will be made clear in Theorem 1) then the parameter estimation error asymptotically decays to zero. Furthermore, it is also established that if the system states are exciting over a finite period of time, then the parameter estimation error can be made as small as desired provided and the learning gains are selected based on the sufficient conditions introduced in Theorem 2.

## V Analysis

### V-a Asymptotic convergence with persistent excitation

Purging of the history stack implies that the resulting closed-loop system is a switched system, where each subsystem corresponds to a history stack, and each purge indicates a switching event.333Since a switching event in Algorithm 1 occurs only when the auxiliary history stack is full, Zeno behavior is avoided by design. To facilitate the analysis, let denote a switching signal such that , and , where denotes the number of times the update was carried out over the time interval . In the following, the subscript denotes the switching index, and denotes the history stack corresponding to the th subsystem (i.e., the history stack active during the time interval ), containing the elements . To simplify the notation, let

 As=M∑j=1YT(xsj)Y(xsj),Qs=M∑j=1YT(xsj)˙~xsj. (4)

Note that and are piece-wise constant functions of time. For ease of exposition, the constant introduced in Algorithm 1 is set to zero for the case where persistent excitation is available.

Algorithm 1 ensures that there exists a constant such that where denotes the minimum eigenvalue. Since the state remains bounded by assumption, there exists a constant such that

Using (2) and (4), the dynamics of the parameter estimation error can be written as

 ˙~θ=−ΓYT(x)~x−kΓAs~θ+kΓQs. (5)

To establish convergence of the state-derivative estimates, a filtered tracking error is defined as Using (1), (3), and (5), the time derivative of the filtered tracking error can be expressed as,

 ˙r=F(x,u)~θ−kγ1Y(x)ΓAs~θ−γ1Y(x)ΓYT(x)~x−~x+kγ1Y(x)ΓQs−k1r, (6)

where and .

To facilitate the stability analysis, let , and be constants such that

 ∥F(x(t),u(t))∥≤¯¯¯¯F,∥Y(x(t))∥≤¯¯¯¯Y,∥Γ∥=¯¯¯¯Γ, ∥f1(x(t))+Y(x(t))θ+g(x(t))u(t)∥≤¯¯¯¯F1,∥x(t)∥≤¯¯¯x, (7)

for all . The following stability analysis is split into three parts. Under the temporary assumptions that the error states , , and are bounded at a switching instance and that the norms of the state-derivative estimates stored in the history stack are bounded, it is established in Part 1 that the error states , , and decay to a bound before the next switching instance, where the bound depends on the derivative estimation errors. Under the temporary assumption that the error states and are bounded at a switching instance, it is established in Part 2 that the derivative estimation error can be made arbitrarily small before the next switching instance by increasing the learning gains. In Part 3, the temporary assumptions in Parts 1 and 2 are relaxed through an inductive argument where the results from Part 1 and Part 2 are used to conclude asymptotic convergence of the error states , and to the origin.

#### Part 1: Boundedness of the error signals

Let and let denote a candidate Lyapunov function defined as

 V(Z) ≜12rTr+12~xT~x+12~θTΓ−1~θ, (8)

Using the Raleigh-Ritz Theorem, the Lyapunov function can be bounded as

 v–∥Z∥2≤V(Z)≤¯¯¯v∥Z∥2, (9)

where The subsequent stability analysis assumes that the learning gains , and and the matrices satisfy the following sufficient gain conditions:444The sufficient conditions can be satisfied provided the gains and are selected large enough.

 a––>3¯¯¯¯Y2kα1+4¯¯¯¯F2kk1+4k¯¯¯¯Y2¯¯¯¯Γ2¯¯¯¯A2k1,k1>6¯¯¯¯Y4¯¯¯¯Γ2α1. (10)

The following Lemma establishes boundedness of the error state .

###### Lemma 1.

Let be a constant such that , for all . Assume temporarily that there exist constants such that the elements of satisfy , for all , and that the candidate Lyapunov function satisfies . Then, the candidate Lyapunov function is bounded as

 V(Z(τ))≤(¯¯¯¯Vs−¯¯¯vvιs)e−v¯v(τ−t)+¯¯¯vvιs,∀τ∈[t,t+T), (11)

where Furthermore, the parameter estimation error can be bounded as

 ∥∥~θ(τ)∥∥≤θs,∀τ∈[t,t+T) (12)

where and .

###### Proof:

Using (5)-(6), the time derivative of the candidate Lyapunov function can be written as

Provided the sufficient conditions in (10) are satisfied, the Lyapunov derivative can be bounded as

 ˙V≤−kc–4∥∥~θ∥∥2−α13∥~x∥2−k18∥r∥2+⎛⎝k2a––+k2¯¯¯¯Y2¯¯¯¯Γ2k1⎞⎠∥Qs∥2.

Using the hypothesis that the elements of the history stack are bounded and the fact that is stabilizing, the Lyapunov derivative can be bounded as

 ˙V≤−v¯¯¯vV(Z)+ιs.

Using the comparison lemma [25, Lemma 3.4],

 V(Z(τ))≤(¯¯¯¯Vs−¯¯¯vvιs)e−v¯v(τ−t)+¯¯¯vvιs,∀τ∈[t,t+T).

If then . If then Hence, using the definition of and the bounds in (9), the bound in (12) is obtained. ∎

#### Part 2: Exponential decay of ˙~x

Let and let be a candidate Lyapunov function defined as The following lemma establishes exponential convergence of the derivative estimation error to a neighborhood of the origin using Lemma 1.

###### Lemma 2.

Let all the hypotheses of Lemma 1 be satisfied. Furthermore, assume temporarily that there exists a constant such that the candidate Lyapunov function satisfies . Then, the Lyapunov function is bounded as

 Vr(Zr(τ))≤(¯¯¯¯Vrs−ιrsvr)e−vr(τ−t)+ιrsvr,∀τ∈[t,t+T), (13)

where and Furthermore, given a constant the gain can be selected large enough such that .

###### Proof:

Using (6), the time derivative of the candidate Lyapunov function can be written as

 ˙Vr=−2rTkY(x)ΓAs~θ−2rTY(x)ΓYT(x)~x+2rT(∇Y(x)Y(x)θ+∇Y(x)f1(x)+∇Y(x)g(x)u)~θ−2rT~x+k2rTY(x)ΓQs−k12rTr+2~xT(r−α1~x).

Completing the squares and using Lemma 1, the Lyapunov derivative can be bounded as

 ˙Vr≤−vrVr(Zr)+ιrs.

Using the comparison lemma, [25, Lemma 3.4]

 Vr(Zr(τ))≤(¯¯¯¯Vrs−ιrsvr)e−vr(τ−t)+ιrsvr,∀τ∈[t,t+T).

Using the fact that, the state-derivative estimation error can be bounded as

 ∥∥˙~x∥∥2≤∥r∥2+α1∥~x∥2≤(1+α1)Vr(Zr).

Based on (13), given , , the gain can be selected large enough so that . Hence, given the gain can be selected to be large enough so that

#### Part 3: Asymptotic convergence to the origin

Lemmas 1 and 2 employ the temporary hypothesis that the state-derivative estimates stored in the history stack remain bounded. However, since the estimates are generated dynamically using (3), they can not be guaranteed to be bounded a priori. In the following, the results of Lemmas 1 and 2 are used in an inductive argument to show that all the states of the dynamical system defined by (5)-(6) remain bounded and decay to the origin asymptotically provided enough data can be recorded to repopulate the history stack after each purge.555The case where the history stack can not be purged and repopulated indefinitely is addressed in Section V-B.

###### Theorem 1.

Provided the history stacks and are populated using Algorithm 1, the learning gains are selected to satisfy and the sufficient gain conditions in (10), a bound is known such that , and provided the system states are exciting such that the history stack can be persistently purged and replenished, i.e.,

 s→∞,ast→∞, (14)

then, , , and as .

###### Proof:

Let be a set of switching time instances defined as That is, for a given switching index denotes the time instance when the th subsystem is switched on. To facilitate proof by mathematical induction, assume temporarily that the hypotheses of Lemmas 1 and 2 are satisfied for for some . Furthermore, assume temporarily that the following sufficient condition is satisfied:

 ιrj>ιr(j+1),ιj>ιj+1,∀j∈{1,2,⋯,s−1}, (15)

Then, using (11) and (13), the Lyapunov functions and can be bounded as and where the constants and were introduced in (11) and (13), respectively, and , and denote the envelopes that bound the Lyapunov functions and , respectively. Using the bounds on the Lyapunov functions, the bounding envelopes at consecutive switching instances can be related as

 W(Ts)−W(Ts−1)=¯¯¯vv(ιs−1−ιs)(ev¯v(Ts−1−Ts)−1)+s−2∑j=1¯¯¯vv(ιj−ιj+1)e−v¯v(Ts−Tj)(1−ev¯v(Ts−Ts−1))+(¯¯¯¯V1−¯¯¯vvι1)(e−v¯v(Ts)−e−v¯v(Ts−1)),

and

 Wr(Ts)−Wr(Ts−1)=(ιr(s−1)vr−ιrsvr)(evr(Ts−1−Ts)−1)+s−2∑j=1(ιrjvr−ιr(j+1)vr)e−vr(Ts−Tj)(1−evr(Ts−Ts−1))+(¯¯¯¯Vr1−ιr1vr)(e−vr(Ts)−e−vr(Ts−1)).

Since the terms , , , , and are always negative. By selecting as

 ¯¯¯¯V1>max(¯¯¯vvι1,¯¯¯v(¯¯¯¯F21+¯¯¯θ2+∥∥˙^x(0)∥∥2+(1+α1)2∥~x(0)∥2+∥∥^θ(0)∥∥2)) (16)

and using (15) and the hypotheses of Theorem 1, then and .

Since the history stack is selected at random to include bounded elements, all the hypotheses of Lemmas 1 and 2 are satisfied over the time interval . Hence, , where . Using the bounds in (7), can be computed as , where is an adjustable parameter. Furthermore, , where and Using the sufficient conditions and stated in Theorem 1, it can be concluded that and . Selecting and , it can be concluded that and .

Since the Lyapunov function does not grow beyond its initial condition over the time interval , . Moreover, since the history stack , which is active over the time interval , is recorded over the time interval all the hypotheses of Lemmas 1 and 2 are also satisfied over the time interval , and the constant can be computed as . Provided is selected such that then Hence, . Since and , then and hence,

Hence, by mathematical induction, the hypotheses of Lemmas 1 and 2 are satisfied for all , and for all . Hence, as Since the Lyapunov function is common among all the subsystems, as

Theorem 1 implies that provided the system states are persistently excited such that the history stack can always be replaced with a new full rank history stack, the state-derivative estimates, and the parameter estimate vector asymptotically converge to the state-derivative and the ideal parameter vector, respectively. However, from a practical perspective, it may be undesirable for a system to be in a persistently exciting state, or excitation beyond a certain finite time-interval may not be available. If excitation is available only over a finite time-interval, then the parameter estimation errors can be shown to be uniformly ultimately bounded, provided the history stacks are updated so that the time interval between two consecutive updates, i.e., the dwell time, is large enough. Algorithm 1 guarantees a minimum dwell time between two consecutive updates provided the constant is selected large enough.

### V-B Ultimate boundedness under finite excitation

For notational brevity, let and .