Sign-Error Adaptive Filtering Algorithms for Markovian ParametersThis research was supported in part by the Army Research Office under grant W911NF-12-1-0223.

# Sign-Error Adaptive Filtering Algorithms for Markovian Parameters††thanks: This research was supported in part by the Army Research Office under grant W911NF-12-1-0223.

Araz Hashemi, Department of Mathematics, Wayne State University, Detroit, MI 48202, araz.hashemi@wayne.edu.    G. Yin, Department of Mathematics, Wayne State University, Detroit, MI 48202, gyin@math.wayne.edu.    Le Yi Wang Department of Electrical and Computer Engineering, Wayne State University, Detroit, MI 48202, lywang@wayne.edu.
December 20, 2012
###### Abstract

Motivated by reduction of computational complexity, this work develops sign-error adaptive filtering algorithms for estimating time-varying system parameters. Different from the previous work on sign-error algorithms, the parameters are time-varying and their dynamics are modeled by a discrete-time Markov chain. A distinctive feature of the algorithms is the multi-time-scale framework for characterizing parameter variations and algorithm updating speeds. This is realized by considering the stepsize of the estimation algorithms and a scaling parameter that defines the transition rates of the Markov jump process. Depending on the relative time scales of these two processes, suitably scaled sequences of the estimates are shown to converge to either an ordinary differential equation, or a set of ordinary differential equations modulated by random switching, or a stochastic differential equation, or stochastic differential equations with random switching. Using weak convergence methods, convergence and rates of convergence of the algorithms are obtained for all these cases.

Key Words. Sign-error algorithms, regime-switching models, stochastic approximation, mean squares errors, convergence, tracking properties.

EDICS. ASP-ANAL

Brief Title. Sign-Error Algorithms for Markovian Parameters

## 1 Introduction

Adaptive filtering algorithms have been studied extensively, thanks to their simple recursive forms and wide applicability for diversified practical problems arising in estimation, identification, adaptive control, and signal processing [26].

Recent rapid advancement in science and technology has introduced many emerging applications in which adaptive filtering is of substantial utility, including consensus controls, networked systems, and wireless communications; see [1, 2, 4, 5, 8, 7, 12, 13, 14, 16, 17, 18, 19, 20, 23, 24, 27]. One typical scenario of such new domains of applications is that the underlying systems are inherently time varying and their parameter variations are stochastic [29, 30, 31]. One important class of such stochastic systems involves systems whose randomly time-varying parameters can be described by Markov chains. For example, networked systems include communication channels as part of the system topology. Channel connections, interruptions, data transmission queuing and routing, packet delays and losses, are always random. Markov chain models become a natural choice for such systems. For control strategy adaptation and performance optimization, it is essential to capture time-varying system parameters during their operations, which lead to the problems of identifying Markovian regime-switching systems pursued in this paper.

When data acquisition, signal processing, algorithm implementation are subject to resource limitations, it is highly desirable to reduce data complexity. This is especially important when data shuffling involves communication networks. This understanding has motivated the main theme of this paper by using sign-error updating schemes, which carry much reduced data complexity, in adaptive filtering algorithms, without detrimental effects on parameter estimation accuracy and convergence rates.

In our recent work, we developed a sign-regressor algorithm for adaptive filters [28]. The current paper further develops sign-error adaptive filtering algorithms. It is well-known that sign algorithms have the advantage of reduced computational complexity. The sign operator reduces the implementation of the algorithms to bits in data communications and simple bit shifts in multiplications. As such, sign algorithms are highly appealing for practical applications. The work [11] introduced sign algorithms and has inspired much of the subsequent developments in the field. On the other hand, employing sign operators in adaptive algorithms has introduced substantial challenges in establishing convergence properties and error bounds.

A distinctive feature of the algorithms introduced in this paper is the multi-time-scale framework for characterizing parameter variations and algorithm updating speeds. This is realized by considering the stepsize of the estimation algorithms and a scaling parameter that defines the transition rates of the Markov jump process. Depending on the relative time scales of these two processes, suitably scaled sequences of the estimates are shown to converge to either an ordinary differential equation, or a set of ordinary differential equations modulated by random switching, or a stochastic differential equation, or stochastic differential equations with random switching. Using weak convergence methods, convergence and rates of convergence of the algorithms are obtained for all these cases.

The rest of the paper is arranged as follows. Section 2 formulates the problems and introduces the two-time-scale framework. The main algorithms are presented in Section 3. Mean-squares errors on parameter estimators are derived. By taking appropriate continuous-time interpolations, Section 4 establishes convergence properties of interpolated sequences of estimates from the adaptive filtering algorithms. Our analysis is based on weak convergence methods. The convergence properties are obtained by using martingale averaging techniques. Section 5 further investigates the rates of convergence. Suitably interpolated sequences are shown to converge to either stochastic differential equations or randomly-switched stochastic differential equations, depending on relations between the two time scales. Numerical results by simulation are presented to demonstrate the performance of our algorithms in Section 6.

## 2 Problem Formulation

Let

 yn=φ′nαn+en, n=0,1,…, (1)

where is the sequence of regression vectors, is a sequence of zero mean random variables representing the error or noise, is the time-varying true parameter process, and is the sequence of observation signals at time .

Estimates of are denoted by and are given by the following adaptive filtering algorithm using a sign operator on the prediction error

 θn+1=θn+μφnsgn(yn−φ′nθn) (2)

where is defined as for . We impose the following assumptions.

• is a discrete-time homogeneous Markov chain with state space

 M={a1,…,am0}, ai∈Rr,i=1,…,m0, (3)

and whose transition probability matrix is given by

 Pε=I+εQ, (4)

where is a small parameter, is the identity matrix, and is an irreducible generator (i.e., satisfies for and for each ) of a continuous-time Markov chain. For simplicity, assume that the initial distribution of the Markov chain is given by , which is independent of for each , where and .

• The sequence of signals is uniformly bounded, stationary, and independent of the parameter process . Let be the -algebra generated by , and denote the conditional expectation with respect to by .

• For each , define

 gn:=φnsgn(φ′n[αn−θn]+en)gn(θ,i):=φnsgn(φ′n[ai−θ]+en)I{αn=ai}˜gn(θ,i):=Engn(θ,i) (5)

For each and , there is an such that given ,

 ˜gn(θ,i)=A(i)n(ai−θ)I{αn=ai}+o(|ai−θ|I{αn=ai})EA(i)n=A(i) (6)
• There is a sequence of non-negative real numbers with such that for each and each , and for some ,

 |EnA(i)j−A(i)|≤Kϕ1/2(j−n) (7)

uniformly in .

###### Remark 2.1

Let us take a moment to justify the practicality of the assumptions. The boundedness assumption in (A2) is fairly mild. For example, we may use a truncated Gaussian process. In addition, it is possible to accommodate unbounded signals by treating martingale difference sequences (which make the proofs slightly simpler).

In (A3), we consider that while is not smooth w.r.t. , its conditional expectation can be a smooth function of . The condition (6) indicates that is locally (near ) linearizable. For example, this is satisfied if the conditional joint density of with respect to is differentiable with bounded derivatives; see [6] for more discussion. Finally, (A4) is essentially a mixing condition which indicates that the remote past and distant future are asymptotically independent. Hence we may work with correlated signals as long as the correlation decays sufficiently quickly between iterates.

## 3 Mean Squares Error Bounds

Denote the sequence of estimation errors by . We proceed to obtain bounds for the mean squares error in terms of the transition rate of the parameter and the adaptation rate of the algorithm .

###### Theorem 3.1

Assume (A1)(A4). Then there is an such that for all ,

 E|˜θn|2=E|αn−θn|2=O(μ+ε+ε2/μ). (8)

Proof. Define a function by . Observe that

 ˜θn+1=αn+1−θn+1=˜θn−μφnsgn(φ′n˜θn+en)+(αn+1−αn) (9)

so

 (10)

By (A2), the Markov chain is independent of and is -measurable. Since the transition matrix is of the form , we obtain

 En(αn+1−αn)=m0∑i=1E(αn+1−ai∣∣αn=ai)I{αn=ai}=m0∑i=1[m0∑j=1aj(δij+εqij)−ai]I{αn=ai}=O(ε) (11)

Similarly,

 En|αn+1−αn|2=m0∑j=1m0∑i=1|aj−ai|2I{αn=ai}P(αn+1=aj|αn=ai)=m0∑j=1m0∑i=1|aj−ai|2I{αn=ai}(δij+εqij)=O(ε) (12)

Note that , so

 O(ε)|˜θn|≤O(ε)(V(˜θn)+1). (13)

Since the signals are bounded, we have

 En|(αn+1−αn)−μφnsgn(φ′n˜θn+en)|2 =En|αn+1−αn|2+O(μ2+με)[V(˜θn)+1) (14)

Applying (14) to (10), we arrive at

 EnV(˜θn+1)−V(˜θn)=−μEn˜θ′nφnsgn(φ′n˜θn+en)+En˜θ′n(αn+1−αn)+En|αn+1−αn|2+O(μ2+με)[V(˜θn)+1] (15)

Note also that by (A3),

 μEn˜θ′nφnsgn(φ′n˜θn+en)=μm0∑i=1En˜θ′nφnsgn(φ′n˜θn+en)I{αn=ai}=μm0∑i=1En˜θ′nA(i)n˜θnI{αn=ai}+μo(˜θn)=μm0∑i=1˜θ′n[A(i)n−A(i)]˜θnI{αn=ai}+μm0∑i=1˜θ′nA(i)˜θnI{αn=ai}+μo(˜θn) (16)

To treat the first three terms in (15), we define the following perturbed Liapunov functions by

 Vμ1(˜θ,n):=∞∑j=nm0∑i=1−μEn˜θ′[A(i)j−A(i)]˜θI{αj=ai}Vμ2(˜θ,n):=∞∑j=n˜θ′En(αj+1−αj)Vμ3(n):=∞∑j=nEn(αn+1−αn)′(αj+1−αj) (17)

By virtue of (A4), we have

 |Vμ1(˜θ,n)|≤μm0∑i=1K|˜θ|2∞∑j=nϕ1/2(j−n)≤O(μ)[V(˜θ)+1] (18)

Note also that the irreducibility of implies that of for sufficiently small . Thus there is an such that for all , for some , where denotes the stationary distribution associated with the transition matrix . Note that the difference of the and step transition matrices is given by

The last line above follows from the fact , hence . Thus

 ∞∑j=n|I+εQ)j+1−n−(I+εQ)j−n|≤O(ε)∞∑j=nλj−nc=O(ε). (19)

The forgoing estimates lead to and as a result

 |Vμ2(˜θ,n)|≤O(ε)(V(˜θ)+1). (20)

and similarly

 |Vμ3(n)|=O(ε), (21)

so all the perturbations can be made small.

Now, we note that

 EnVμ1(˜θn+1,n+1)−Vμ1(˜θn,n)=EnVμ1(˜θn+1,n+1)−EnVμ1(˜θn,n+1)+EnVμ1(˜θn,n+1)−Vμn(˜θn,n). (22)

where

 EnVμ1(˜θn,n+1)−Vμ1(˜θn,n)=μm0∑i=1˜θ′n[A(i)n−A(i)]˜θnI{αn=ai} (23)

and

 EnVμ1(˜θn+1,n+1)−EnVμ1(˜θn,n+1) =μ∞∑j=n+1m0∑i=1En(˜θn+1−˜θn)′[A(i)j−A(i)]˜θn+1I{αn=ai}+μ∞∑j=n+1m0∑i=1En˜θ′n[A(i)j−A(i)](˜θn+1−˜θn)I{αn=ai}. (24)

Using (11), we have

 En|˜θn+1−˜θn|≤En|αn+1−αn|+μEn|φnsgn(φ′n˜θn+en)|=O(ε+μ). (25)

Thus, in view of (A4)

 (26)

and

 ∣∣ ∣∣μ∞∑j=n+1m0∑i=1En(˜θn+1−˜θn)′En+1[A(i)j−A(i)]˜θn+1I{αn=ai}∣∣ ∣∣≤O(μ2+με)[V(˜θn)+1]. (27)

Putting together (22)–(27), we establish that

 EnVμ1(˜θn+1,n+1)−Vμ1(˜θn,n)=μm0∑i=1En˜θ′n[A(i)j−A(i)]˜θnI{αn=ai}+O(μ2+με)[V(˜θn)+1]. (28)

Likewise, we can obtain

 EnVμ2(˜θn+1,n+1)−Vμ2(˜θn,n)=−En˜θ′n(αn+1−αn)+O(ε2+μ2) (29)

and

 EnVμ3(n+1)−Vμ3(n)=−En|αn+1−αn|2+O(ε2). (30)

Now we define

 W(˜θ,n)=V(˜θ)+Vμ1(˜θ,n)+Vμ2(˜θ,n)+Vμ3(n).

Since each is a stable matrix there is a such that for each . Thus we may take such that . Using this along with (10), (16), (28)–(30), and the inequality , we arrive at

 EnW(˜θn+1,n+1)−W(˜θn,n)=−μm0∑i=1˜θ′nA(i)˜θnI{αn=ai}−μO(˜θn)+O(μ2+ε2)[V(˜θn)+1]≤−λμV(˜θn)+O(μ2+ε2)[V(˜θn)+1]≤−λμW(˜θn,n)+O(μ2+ε2)[W(˜θn,n)+1]. (31)

Choose and small enough so that there is a satisfying and

 −λμ+O(μ2)+O(ε2)≤−λ0μ.

Then we obtain

 EnW(˜θn+1,n+1)≤(1−λ0μ)W(˜θn,n)+O(μ2+ε2).

Note that there is an such that for . Taking expectation in the iteration for and iterating on the resulting inequality yield

 EW(˜θn+1,n+1)≤(1−λ0μ)nW(˜θ0,0)+O(μ+ε2/μ).

Thus

 EW(˜θn+1,n+1)≤O(μ+ε2/μ).

Finally, applying (18)–(21) again, we also obtain

 EV(˜θn+1)≤O(μ+ε+ε2/μ).

Thus the desired result follows.

## 4 Convergence Properties

### 4.1 Switching ODE Limit: μ=O(ε)

We assume the adaptation rate and the transition frequency are of the same order, that is . For simplicity, we take . To study the asymptotic properties of the sequence , we take a continuous-time interpolation of the process. Define

 θμ(t)=θn, αμ(t)=αn,  for  t∈[nμ,nμ+μ).

We proceed to prove that converges weakly to a system of randomly switching ordinary differential equations.

###### Theorem 4.1

Assume (A1)–(A4) hold and . Then the process converges weakly to such that is a continuous-time Markov chain generated by and the limit process satisfies the Markov switched ordinary differential equation

 ˙θ(t)=A(α(t))(α(t)−θ(t)), θ(0)=θ0. (32)

The theorem is established through a series of lemmas. We begin by using a truncation device to bound the estimates. Define to be the ball with radius , and as a truncation function that is equal to 1 for , 0 for , and sufficiently smooth between. Then we modify algorithm (2) so that

 θNn+1:=θNn+μφnsgn(yn−φ′nθNn)qN(θNn), n=0,1,…, (33)

is now a bounded sequence of estimates. As before, define

 θN,μ(t):=θNn  for  t∈[μn,μn+μ).

We shall first show that the sequence is tight, and thus by Prohorov’s theorem we may extract a convergent subsequence. We will then show the limit satisfies a switched differential equation. Lastly, we let the truncation bound grow and show the untruncated sequence given by (2) is also weakly convergent.

###### Lemma 4.2

The sequence is tight in .

Proof of Lemma 4.2. Note that the sequence is tight by virtue of [33, Theorem 4.3]. In addition, converges weakly to a Markov chain generated by . To proceed, we examine the asymptotics of the sequence . We have that for any , and satisfying ,

 Eμt∣∣θN,μ(t+s)−θN,μ(t)∣∣2≤Eμt∣∣ ∣∣μ(t+s)/μ−1∑k=t/μφksgn(yk−φ′kθNk)qN(θNk)∣∣ ∣∣2≤μ2Eμt(t+s)/μ−1∑j=t/μ(t+s)/μ−1∑k=t/μφ′jφksgn(yj−φ′jθNj)sgn(yk−φ′kθNk)qN(θNj)qN(θNk)≤μ2(t+s)/μ−1∑j=t/μ(t+s)/μ−1∑k=t/μEμt∣∣φj∣∣2Eμt|φk|2≤O(s2)≤O(δ2). (34)

For any and any , use to denote the conditional expectation w.r.t. the -algebra , we have

 limδ→0limsupμ→0{sup0≤s≤δE[Eμt∣∣θN,μ(t+s)−θN,μ(t)∣∣2]}=0.

Applying the criterion [15, p.47], the tightness is proved.

Since is tight, it is sequentially compact. By virtue of Prohorov’s theorem, we can extract a weakly convergence subsequence. Select such a subsequence and still denote it by for notational simplicity. Denote the limit by . We proceed to characterize the limit process.

###### Lemma 4.3

The sequence converges weakly to that is a solution of the martingale problem with operator

 LN1f(θN,ai):=∇f′(θN,ai)A(i)[ai−θN]qN(θN)+m0∑j=1qijf(θN,aj), (35)

where for each , functions with compact support.

Proof. To derive the martingale limit, we need only show that for the function with compact support , for each bounded and continuous function , each , each positive integer , and each for ,

 Eh(θN(ti),α(ti):i≤κ)[f(θN(t+s),α(t+s))−f(θN(t),α(t))−∫t+stLN1f(θN(τ),α(τ))dτ]=0. (36)

To verify (36), we use the processes indexed by . As before, note that

 θN,μ(t+s)−θN,μ(t)=(t+s)/μ−1∑k=t/μμφksgn(φ′k[αk−θk]+ek)qN(θNk). (37)

Subdivide the interval with the end points and by choosing such that as but . By the smoothness of , it is readily seen that as ,

 Eh(θN,μ(ti),αμ(ti):i≤κ)[f(θN,μ(t+s),αμ(t+s))−f(θN,μ(t),αμ(t))]→Eh(θN(ti),α(ti):i≤κ)[f(θN(t+s),α(t+s))−f(θN(t),α(t))]. (38)

Next, we insert a term to examine the change in the parameter and the estimate separately

 limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)[f(θN,μ(t+s),αμ(t+s))−f(θN,μ(t),αμ(t))] =limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)⎡⎣t+s∑lδμ=t[f(θNlmμ+mμ,αlmμ+mμ)−f(θNlmμ,αlmμ)]⎤⎦ =limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)[t+s∑lδμ=t[f(θNlmμ+mμ,αlmμ+mμ)−f(θNlmμ+mμ,αlmμ)]+t+s∑lδμ=t[f(θNlmμ+mμ,αlmμ)−f(θNlmμ,αlmμ)]]. (39)

First, we work with the last term in (39). By using a Taylor expansion on each interval indexed by we have

 limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)[t+s∑lδμ=t[f(θNlmμ+mμ,αlmμ)−f(θNlmμ,αlmμ)]]=limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)t+s∑lδμ=t[δμ1mμlmμ+mμ−1∑k=lmμ∇f′(θNlmμ,αlmμ)×φksgn(φ′k(αk−θNk)+ek)qN(θNk)+lmμ+μ−1∑k=lmμ[∇f′(θN,+lmμ,αlmμ)−∇f′(θNlmμ,αlmμ)](θNk+1−θNk)qN(θNk)]. (40)

where is a point on the line segment joining and . Since

 |θNlmμ+mμ−θNlmμ|=O(δμ)

and is smooth, we have the last term in (40) is in the sense of in probability as . To work with the first term we insert the conditional expectation and apply (6) to obtain

 limμ→0Eh(θN,μ(ti),αμ(ti):i≤κ)t+s∑lδμ=tδμ1mμ