SignError Adaptive Filtering Algorithms for Markovian Parameters^{†}^{†}thanks: This research was supported in part by the Army Research Office under grant W911NF1210223.
Abstract
Motivated by reduction of computational complexity, this work develops signerror adaptive filtering algorithms for estimating timevarying system parameters. Different from the previous work on signerror algorithms, the parameters are timevarying and their dynamics are modeled by a discretetime Markov chain. A distinctive feature of the algorithms is the multitimescale framework for characterizing parameter variations and algorithm updating speeds. This is realized by considering the stepsize of the estimation algorithms and a scaling parameter that defines the transition rates of the Markov jump process. Depending on the relative time scales of these two processes, suitably scaled sequences of the estimates are shown to converge to either an ordinary differential equation, or a set of ordinary differential equations modulated by random switching, or a stochastic differential equation, or stochastic differential equations with random switching. Using weak convergence methods, convergence and rates of convergence of the algorithms are obtained for all these cases.
Key Words. Signerror algorithms, regimeswitching models, stochastic approximation, mean squares errors, convergence, tracking properties.
EDICS. ASPANAL
Brief Title. SignError Algorithms for Markovian Parameters
1 Introduction
Adaptive filtering algorithms have been studied extensively, thanks to their simple recursive forms and wide applicability for diversified practical problems arising in estimation, identification, adaptive control, and signal processing [26].
Recent rapid advancement in science and technology has introduced many emerging applications in which adaptive filtering is of substantial utility, including consensus controls, networked systems, and wireless communications; see [1, 2, 4, 5, 8, 7, 12, 13, 14, 16, 17, 18, 19, 20, 23, 24, 27]. One typical scenario of such new domains of applications is that the underlying systems are inherently time varying and their parameter variations are stochastic [29, 30, 31]. One important class of such stochastic systems involves systems whose randomly timevarying parameters can be described by Markov chains. For example, networked systems include communication channels as part of the system topology. Channel connections, interruptions, data transmission queuing and routing, packet delays and losses, are always random. Markov chain models become a natural choice for such systems. For control strategy adaptation and performance optimization, it is essential to capture timevarying system parameters during their operations, which lead to the problems of identifying Markovian regimeswitching systems pursued in this paper.
When data acquisition, signal processing, algorithm implementation are subject to resource limitations, it is highly desirable to reduce data complexity. This is especially important when data shuffling involves communication networks. This understanding has motivated the main theme of this paper by using signerror updating schemes, which carry much reduced data complexity, in adaptive filtering algorithms, without detrimental effects on parameter estimation accuracy and convergence rates.
In our recent work, we developed a signregressor algorithm for adaptive filters [28]. The current paper further develops signerror adaptive filtering algorithms. It is wellknown that sign algorithms have the advantage of reduced computational complexity. The sign operator reduces the implementation of the algorithms to bits in data communications and simple bit shifts in multiplications. As such, sign algorithms are highly appealing for practical applications. The work [11] introduced sign algorithms and has inspired much of the subsequent developments in the field. On the other hand, employing sign operators in adaptive algorithms has introduced substantial challenges in establishing convergence properties and error bounds.
A distinctive feature of the algorithms introduced in this paper is the multitimescale framework for characterizing parameter variations and algorithm updating speeds. This is realized by considering the stepsize of the estimation algorithms and a scaling parameter that defines the transition rates of the Markov jump process. Depending on the relative time scales of these two processes, suitably scaled sequences of the estimates are shown to converge to either an ordinary differential equation, or a set of ordinary differential equations modulated by random switching, or a stochastic differential equation, or stochastic differential equations with random switching. Using weak convergence methods, convergence and rates of convergence of the algorithms are obtained for all these cases.
The rest of the paper is arranged as follows. Section 2 formulates the problems and introduces the twotimescale framework. The main algorithms are presented in Section 3. Meansquares errors on parameter estimators are derived. By taking appropriate continuoustime interpolations, Section 4 establishes convergence properties of interpolated sequences of estimates from the adaptive filtering algorithms. Our analysis is based on weak convergence methods. The convergence properties are obtained by using martingale averaging techniques. Section 5 further investigates the rates of convergence. Suitably interpolated sequences are shown to converge to either stochastic differential equations or randomlyswitched stochastic differential equations, depending on relations between the two time scales. Numerical results by simulation are presented to demonstrate the performance of our algorithms in Section 6.
2 Problem Formulation
Let
(1) 
where is the sequence of regression vectors, is a sequence of zero mean random variables representing the error or noise, is the timevarying true parameter process, and is the sequence of observation signals at time .
Estimates of are denoted by and are given by the following adaptive filtering algorithm using a sign operator on the prediction error
(2) 
where is defined as for . We impose the following assumptions.

is a discretetime homogeneous Markov chain with state space
(3) and whose transition probability matrix is given by
(4) where is a small parameter, is the identity matrix, and is an irreducible generator (i.e., satisfies for and for each ) of a continuoustime Markov chain. For simplicity, assume that the initial distribution of the Markov chain is given by , which is independent of for each , where and .

The sequence of signals is uniformly bounded, stationary, and independent of the parameter process . Let be the algebra generated by , and denote the conditional expectation with respect to by .

For each , define
(5) For each and , there is an such that given ,
(6) 
There is a sequence of nonnegative real numbers with such that for each and each , and for some ,
(7) uniformly in .
Remark 2.1
Let us take a moment to justify the practicality of the assumptions. The boundedness assumption in (A2) is fairly mild. For example, we may use a truncated Gaussian process. In addition, it is possible to accommodate unbounded signals by treating martingale difference sequences (which make the proofs slightly simpler).
In (A3), we consider that while is not smooth w.r.t. , its conditional expectation can be a smooth function of . The condition (6) indicates that is locally (near ) linearizable. For example, this is satisfied if the conditional joint density of with respect to is differentiable with bounded derivatives; see [6] for more discussion. Finally, (A4) is essentially a mixing condition which indicates that the remote past and distant future are asymptotically independent. Hence we may work with correlated signals as long as the correlation decays sufficiently quickly between iterates.
3 Mean Squares Error Bounds
Denote the sequence of estimation errors by . We proceed to obtain bounds for the mean squares error in terms of the transition rate of the parameter and the adaptation rate of the algorithm .
Theorem 3.1
Assume (A1)–(A4). Then there is an such that for all ,
(8) 
Proof. Define a function by . Observe that
(9) 
so
(10) 
By (A2), the Markov chain is independent of and is measurable. Since the transition matrix is of the form , we obtain
(11) 
Similarly,
(12) 
Note that , so
(13) 
Since the signals are bounded, we have
(14) 
Applying (14) to (10), we arrive at
(15) 
Note also that by (A3),
(16) 
To treat the first three terms in (15), we define the following perturbed Liapunov functions by
(17) 
By virtue of (A4), we have
(18) 
Note also that the irreducibility of implies that of for sufficiently small . Thus there is an such that for all , for some , where denotes the stationary distribution associated with the transition matrix . Note that the difference of the and step transition matrices is given by
The last line above follows from the fact , hence . Thus
(19) 
The forgoing estimates lead to and as a result
(20) 
and similarly
(21) 
so all the perturbations can be made small.
Now, we note that
(22) 
where
(23) 
and
(24) 
Using (11), we have
(25) 
Thus, in view of (A4)
(26) 
and
(27) 
Putting together (22)–(27), we establish that
(28) 
Likewise, we can obtain
(29) 
and
(30) 
4 Convergence Properties
4.1 Switching ODE Limit:
We assume the adaptation rate and the transition frequency are of the same order, that is . For simplicity, we take . To study the asymptotic properties of the sequence , we take a continuoustime interpolation of the process. Define
We proceed to prove that converges weakly to a system of randomly switching ordinary differential equations.
Theorem 4.1
Assume (A1)–(A4) hold and . Then the process converges weakly to such that is a continuoustime Markov chain generated by and the limit process satisfies the Markov switched ordinary differential equation
(32) 
The theorem is established through a series of lemmas. We begin by using a truncation device to bound the estimates. Define to be the ball with radius , and as a truncation function that is equal to 1 for , 0 for , and sufficiently smooth between. Then we modify algorithm (2) so that
(33) 
is now a bounded sequence of estimates. As before, define
We shall first show that the sequence is tight, and thus by Prohorov’s theorem we may extract a convergent subsequence. We will then show the limit satisfies a switched differential equation. Lastly, we let the truncation bound grow and show the untruncated sequence given by (2) is also weakly convergent.
Lemma 4.2
The sequence is tight in .
Proof of Lemma 4.2. Note that the sequence is tight by virtue of [33, Theorem 4.3]. In addition, converges weakly to a Markov chain generated by . To proceed, we examine the asymptotics of the sequence . We have that for any , and satisfying ,
(34) 
For any and any , use to denote the conditional expectation w.r.t. the algebra , we have
Applying the criterion [15, p.47], the tightness is proved.
Since is tight, it is sequentially compact. By virtue of Prohorov’s theorem, we can extract a weakly convergence subsequence. Select such a subsequence and still denote it by for notational simplicity. Denote the limit by . We proceed to characterize the limit process.
Lemma 4.3
The sequence converges weakly to that is a solution of the martingale problem with operator
(35) 
where for each , functions with compact support.
Proof. To derive the martingale limit, we need only show that for the function with compact support , for each bounded and continuous function , each , each positive integer , and each for ,
(36) 
To verify (36), we use the processes indexed by . As before, note that
(37) 
Subdivide the interval with the end points and by choosing such that as but . By the smoothness of , it is readily seen that as ,
(38) 
Next, we insert a term to examine the change in the parameter and the estimate separately
(39) 
First, we work with the last term in (39). By using a Taylor expansion on each interval indexed by we have
(40) 
where is a point on the line segment joining and . Since
and is smooth, we have the last term in (40) is in the sense of in probability as . To work with the first term we insert the conditional expectation and apply (6) to obtain