Adaptive Convex Combination of APA and ZA-APA algorithms for Sparse System Identification

# Adaptive Convex Combination of APA and ZA-APA algorithms for Sparse System Identification

Vinay Chakravarthi Gogineni, Mrityunjoy Chakraborty, Senior Member, IEEE
Department of Electronics and Electrical Communication Engineering
Indian Institute of Technology, Kharagpur, INDIA
Phone: Fax:
E.Mail : vinaychakravarthi@ece.iitkgp.ernet.in, mrityun@ece.iitkgp.ernet.in
###### Abstract

In general, one often encounters the systems that have sparse impulse response, with time varying system sparsity. Conventional adaptive filters which perform well for identification of non-sparse systems fail to exploit the system sparsity for improving the performance as the sparsity level increases. This paper presents a new approach that uses an adaptive convex combination of Affine Projection Algorithm (APA) and Zero-attracting Affine Projection Algorithm (ZA-APA)algorithms for identifying the sparse systems, which adapts dynamically to the sparsity of the system. Thus works well in both sparse and non-sparse environments and also the usage of affine projection makes it robust against colored input. It is shown that, for non-sparse systems, the proposed combination always converges to the APA algorithm, while for semi-sparse systems, it converges to a solution that produces lesser steady state EMSE than produced by either of the component filters. For highly sparse systems, depending on the value of the proportionality constant () in ZA-APA algorithm, the proposed combined filter may either converge to the ZA-APA based filter or produce a solution similar to the semi-sparse case i.e., outerperforms both the constituent filters.

Index terms-Sparse Systems, Norm, Compressive Sensing, Excess Mean Square Error.

## I Introduction

Usually, many real-life systems exhibit sparse representation i.e., their system impulse response is characterized by small number of non zero taps in the presence of large number of inactive taps. Sparse systems are encountered in many important practical applications such as network and acoustic echo cancelers [1]-[2], HDTV channels [3], wireless multipath channels [4], underwater acoustic communications [5]. The conventional system identification algorithms such as LMS and NLMS are sparsity agnostic i.e., they are unaware of underlying sparsity of the system impulse response. Recent studies have shown that the a priori knowledge about the system sparsity, if utilized properly by the identification algorithm, can result in substantial improvement in its estimation performance. This resulted in a flurry of research activities in the last decade or so towards developing sparsity aware adaptive filter algorithms, notable amongst them being the Proportionate Normalized LMS (PNLMS) algorithm [6] and its variants [7]-[8]. These current regressor based algorithms exhibit slower convergence rate for the correlated input. As a solution, these proportionate-type concepts were extended to data reuse case to provide the uniform convergence rate for both white and correlated input [9].

On the other hand, drawing from the ideas of compressed Sensing (CS) [10]-[12], several new sparse adaptive filters have been proposed in the last few years, notably, the zero attracting LMS (ZALMS) [13] derived by incorporating the norm penalty into the LMS cost function which was later extended to ZA-APA [14]. The simplicity of using norm penalty makes their implementation extremely simple. These algorithms perform well for the systems that are highly sparse, but struggle as the system sparsity decreases. That means these algorithms cannot perform well as the system sparsity varies widely over time.

In [15], an improved PNLMS (IPNLMS) algorithm, which is a controlled mixture of the PNLMS and the NLMS is proposed to handle the variable system sparsity. In [16], an adaptive convex combination of two IPNLMS filters is proposed that can adapt to situations where the system sparsity is time varying and unknown. However, it provides the steady state MSD which is same as that of conventional sparse agnostic filters. Reweighted ZALMS (RZALMS) [13] addresses this issue by selecting the shrinkage parameter , in a tricky fashion at the cost of increased complexity. As a solution to this problem, in [17], a convex combination of LMS and ZALMS algorithm has been proposed that switches between ZALMS and LMS adaptive filters depending on the sparsity level, thus enjoying the robustness against time-varying system sparsity with reduced complexity compared to the RZALMS. However, the aforementioned mechanisms that support time varying sparsity could not perform well in colored (correlated) input signal condition.

In this paper, we propose an alternative method that enjoys the robustness against time-varying system sparseness as well as coloredness of (correlated) input signal by using an adaptive convex combination of the Affine Projection Algorithm (APA) and Zero Attracting Affine Projection Algorithm (ZA-APA). The proposed algorithm uses the general framework of convex combination of adaptive filters and requires less complexity. The performance of the proposed scheme is first evaluated analytically by carrying out the convergence analysis. This requires evaluation of the steady state cross correlation between the output a priori errors generated by the APA and the ZA-APA based filters and then relating it to the respective steady state EMSE of the two filters. In our analysis, we have carried out this exercise for three different kinds of systems, namely, non-sparse, semi-sparse and sparse. The analysis shows that the proposed combined filter always converges to the APA based filter for a non-sparse system, while for semi-sparse systems, it converges to a solution that can outperform both the constituents. For sparse systems, the proposed scheme usually converges to the ZA-APA based filter. However, by adjusting the regularized parameter , a solution like the semi sparse case can be achieved i.e., outperforming both the filters. Finally, the robustness of the proposed methods against variable sparsity and coloredness is verified by the detailed simulation studies.

## Ii Problem Formulation, Proposed Algorithm and Performance Analysis

We consider the problem of identifying an unknown system (supposed to be sparse), modeled by the tap coefficient vector which takes a signal with variance as the input and produces the observable output , where is the input data vector at time , and is the observation noise with zero mean and variance . In order to identify the system, the approach mentioned in [18] is followed to deploy an adaptive convex combination of two adaptive filters as shown in the Fig. , where filter uses the APA/ZA-APA algorithm to adapt a filter coefficient vector as follows,

 (1)

where is the step size (common for both the filters), controller constant , when and , for , also is the respective filter output error vectors with denoting the respective filter output vectors, and is the input data matrix. The convex combination generates a combined output vector . The variable , is a mixing parameter, which is to be adapted by the following gradient descent method to minimize the quadratic error function of the overall filter, namely , where . However, such adaptation does not guarantee that will lie between 0 and 1. Therefore, instead of , an equivalent variable is updated which expresses as a sigmoidal function, i.e., . The update equation of is given by [18],

 a(n+1)=a(n)−μ2∂e2(n)∂a(n)=a(n)+μae(n)[y1(n)−y2(n)]λ(n)[1−λ(n)] (2)

In practice, for and conversely, for . Therefore, instead of updating the up to , it is sufficient to restrict it to a range ( a large finite number) which limits the permissible range of to , where .

### Ii-a Important Assumptions and Definitions

In [19], [20] convergence analysis of APA was presented using an equivalent form of APA known as Normalized LMS with Orthogonal Correction Factors (NLMS-OCF), along with the assumptions proposed by Slock in [21] for modeling the system input. The analysis in [19] provides the basis for the work presented in this paper. Some introduction about the NLMS-OCF algorithm is presented below. For the given input signal , observation noise and the reference signal , the NLMS-OCF algorithm uses orthogonal correction factors based on the past input vectors (each of length ), to update the weights , in every iteration. The weight update equation is

 w(n+1)=w(n)+μ0u(n)+μ1u1(n)+...+μM−1uM−1(n) (3)

where the notation refers to the component of vector that is orthogonal to the vectors: , where is the previous input signal vector index.

And also for is chosen as,

 μk=⎧⎪ ⎪⎨⎪ ⎪⎩μe(n)uT(n)u(n),fork=0,if% ∥u(n)∥≠0μek(n)(uk(n))Tuk(n),fork=1,2,…,M−1,if∥uk(n)∥≠0 (4)

where

 e(n)=d(n)−wT(n)u(n)ek(n)=d(n−k)−(wk(n))Tu(n−k)fork=1,2,…,M−1wk(n)=w(n)+μ0u(n)+μ1u1(n)+...+μk−1uk−1(n) (5)

The algorithm can be seen as the process of computing weight updates, using NLMS, based on the current input data vector, , as well as the orthogonal components from each of the previous input data vectors.

In addition to NLMS-OCF, as described in [19] the performance analysis is done based on the following assumptions on the signal and the underlying system

1. The input signal vectors are assumed to be zero mean with covariance matrix

 R=E[u(n)uT(n)]=VΛVT (6)

where , and . Here, are the eigenvalues of input covariance matrix R and are the corresponding orthonormal eigenvectors i.e., V is a unitary matrix.

2. Observation noise (i.e., zero mean white noise, with variance ) is independent of and, the initial conditions and are also independent of , .

3. The random signal vector is the product of three i. i. d. random variables, that is

 u(n)=s(n)r(n)\emph{v}(n) (7a) where P{s(n)=±1}=12r(n)∼∥u(n)∥p{v(n)=\emph{v}i}=pi=λitr(R),i=1,2,...,L (7b)

here i.e., has the same distribution as the norm of the true input signal vectors.

Assumption A3 was first introduced by Slock in [19], which leads to a simple distribution for the vectors while following the actual first and second order statistics of the input signal in A1 consistently. Assumption A3 was used in [19], [20] as well as here, to To simplify the convergence analysis.

As mentioned in [19], using Assumption A3, the weight update in - can be simplified, since the computation of orthogonal components becomes unnecessary i.e., each is already chosen from the orthogonal set. Hence, NLMS-OCF update equation can be rewritten as follows

 (8)

where

 μk=⎧⎪ ⎪⎨⎪ ⎪⎩μe(n)uT(n)u(n),fork=0,if% ∥u(n)∥≠0μek(n)uT(n−k)u(n−k),for% k=1,2,…,M−1,if∥uk(n)∥≠0 (9)

Therefore, using the above NLMS-OCF approximation, weight update equations of Filter and Filter are as follows
APA weight update Equation:

 (10)

and ZA-APA weight update equation:

 (11)

where can be calculated using .

Next, as in [18], certain definitions that are useful in the analysis are presented below. For , we thus define,

1. Weight Error Vectors: ;

2. Equivalent Weight Vector for the combined filter: ;

3. Equivalent Weight Error Vector for the combined filter: ;

4. A Priori Errors: and . Clearly, and ;

5. Excess Mean Square Error (EMSE): , for and ;

6. Cross EMSE: [from Cauchy-Schwartz inequality]. This means cannot be greater than both and simultaneously.

From , one can write,

 (12)

As shown in [18], the convergence of in depends on the steady state values of the individual filter EMSE and the cross EMSE, namely, and respectively. In practice, both and , however, take only finite number of steps to reach their steady state values as both the APA and the ZA-APA algorithms converge in finite number of iterations. Substituting in , where is defined above, noting that and also that , and assuming like [18] that in steady state, is independent of the a priori errors , it is easy to verify that for large (theoretically for ),

 (13)

where and . By assuming and are constant ( i.e., APA and ZA-APA algorithms have converged) Eq. , can be used to yield the dynamics of the evolution of . To analyze the convergence of , we need to evaluate and of the proposed combination.

## Iii Performance Analysis of the Combination

In this section, we examine the convergence behavior of the proposed convex combination for various levels of sparsity i.e., how the filter coefficients vector of the proposed combination adapts dynamically to an optimum value as dictated by the sparsity of the system. For this, first we evaluate and .

### Iii-a Mean Convergence Analysis of the ZA-APA Algorithm

From Eq., the recursion for the weight error vector of the ZA-APA algorithm can be written as follows:

 (14)

where is the set of or fewer indices for which the input regressor vectors are orthogonal to each other. The orthogonalization process determines the indices forming the set . The equation forms the basis for the performance analysis of the ZA-APA algorithm. Using the statistical independence between and (i.e., “independence assumption”), and recalling that is zero-mean i. i. d random variable which is independent of and thus of , one can write

 E[˜w2(n+1)]=[IL−μE[∑j∈Jnu(n−j)uT(n−j)uT(n−j)u(n−j)]]E[˜w2(n)]+ρE[sgn(w2(n))] (15)

Using A3), we can write the outer to inner product as,

 u(n−j)uT(n−j)uT(n−j)u(n−j)=s(n−j)r(n−j)v(n−j)vT(n−j)s(n−j)r(n−j)s2(n−j)r2(n−j)∥v(n−j)∥2=v(n−j)vT(n−j) (16)

where . Note that the above result is independent of the norm of .
Using the result presented in , the Eq. can be rewritten as

 (17)

where

 Kn={k:∃j∈Jn∋u(n−j)uT(n−j)uT(n−j)u(n−j)=\emph{v}k\emph{v}Tk}⊆{1,2,⋯,L} (18)

By defining the vector as the representation of in terms of the orthonormal vectors . That is and .
Using this notation, pre-multiplying by results in

 (19)

From the orthonormality of ’s we have

 \emph{v}Ti∑k∈Kn\emph{v}k\emph{v}Tk={\emph{v}Ti,ifi∈Kn0,ifi∉Kn (20)

Using the above result and substituting , Eq.(19) becomes

 α2,i(n+1)=[1−μβi]α2,i(n)+ρbi(n) (21)

where and the term in is the probability of drawing an eigen vector from the eigen vectors set at most in trials. If we assume is the probability of drawing an eigen vector , then .
In steady state i.e., as , from (21), it is possible to write

 limn→∞α2,i(n)=ρμβilimn→∞bi(n) (22)

To evaluate , we classify the unknown parameters (i.e., filter taps) into two disjoint subsets same as in [13], denoting, and for active and inactive filter taps respectively. If the tap of the optimum filter is active, then , else . For sufficiently small controller constant , for every , we have . On the other hand for every , we have . Since for every , so for zero mean Gaussian input, in steady state we can assume to be Gaussian with zero mean.

Therefore, from the aforementioned remarks and for the sufficiently small controller constant , we can approximate,

 limn→∞sgn(w2,i(n))=sgn(wopt,i)fori∈NZlimn→∞E[sgn(w2,i(n))]=0fori∈Z (23)

These approximations are helpful to evaluate the expectations of non linear function , involved in for both white and correlated inputs. To derive the closed form expressions for the steady state mean of the weight deviation () of active and inactive coefficients, at this stage, we assume that the ZA-APA algorithm is operating on white input. This assumption simplifies the approach and the eigen vectors set becomes the trivial basis . Therefore, in steady state for the ZA-APA algorithm, we have,

 limn→∞α2,i(n)=limn→∞E[˜w2,i(n)]={ρμβisgn(wopt,i)fori∈NZ0fori∈Z (24)

### Iii-B Excess Mean Square Error Analysis of the ZA-APA Algorithm

The EMSE of ZA-APA algorithm can be written in the following form,

 Jex,2(n)=E[e2a,2(n)]=E[˜wT2(n)R˜w2(n)]=Tr(RK2(n))=Tr(ΛVTK2(n)V% ) (25)

where .

Let us define the diagonal elements of the transformed covariance matrix as for . That is

 [VTK2(n)V]ii=\emph{v}TiK2(n)\emph{v}i=˜λ2,i(n) (26)

with the above notation EMSE of the ZA-APA algorithm can be written as follows:

 Jex,2(n)=L∑i=1λi˜λ2,i(n) (27)

Using the statistical independence between and (i.e., “independence assumption”), and recalling that is of zero-mean and also independent of and thus of , one can write the recursion for the mean square deviation of ZA-APA algorithm as follows:

 (28)

Using to replace the outer-to-inner product ratios with , one can have

 (29)

With the notation presented in , pre and post multiplication of (29) by and respectively, results in

 (30)

From the orthonormality of ’s, using the result presented in and substituting , Eq.(30) becomes

 (31)

where and

In steady state i.e., as , from (31), it is possible to write

 ˜λ2,i(∞)=μ2−μξ0E[1r2]˜λ1,i(∞) of APA \@@cite[cite][\@@bibref18]+ρ(1−μβi)μ(2−μ)βi[\emph{v}HiΦ(∞)\emph{v}i]+ρ(1−μβi)μ(2−μ)βi[%\emphvTiΦT(∞)%\emphvi]+ρ2μ(2−μ)βi[\emph{v}TiΨ(∞)\emph{v}i]Δ2,i(∞)% (32)

where and

To evaluate and , as mentioned in the convergence in mean section the unknown parameters (i.e., filter taps) are classified into two disjoint subsets denoting, and for active and inactive filter taps respectively. Under small misadjustment condition ( i.e., the steady state standard deviation of ( ) is very small), also following the lines of [22], we can approximate and for all . These approximations are helpful to evaluate the expectations of non linear functions involved in for both white and correlated input. To derive the closed form expressions for the steady state mean square deviation () of active and inactive coefficients, we assume that the ZA-APA algorithm is operating on white input. This assumption simplifies the approach and the eigen vectors set becomes the trivial basis . Therefore, in steady state for the ZA-APA algorithm, we have,

 (33)

and

 (34)

Using these results for every we can write,

 ˜λ2,i(∞)=μ2−μξ0E[1r2]˜λ1,i(∞) +ρ2(2−μβi)μ2β2i[2−μ]Δ2,i(∞) (35)

On the other hand for , . Assuming and to be jointly Gaussian (having mean zero in the steady state, as ), using Price’s theorem we can write, , where .

With these results for , the equation can be written as,

 ˜λ2,i(∞)=μ2βiξ0E[1r2]+ρ2μ(2−μ)βi+2ρ(1−μβi)√2π√1˜λ2,i(∞) (36)

the above equation is a quadratic of the form,

 a˜λ1,i(∞)+b√˜λ1,i(∞)+c=0 (37a) where the coefficients are, a=1b=√8πρ(1−μβi)μ(2−μ)βic=−(μ2−μξ0E[1r2]+ρ2μ(2−μ)βi) (37b)

Then, for , we have ( since is positive, note that only real, positive root has to be considered.)

 (38)

Squaring both the sides of the above equation, the steady state mean square deviation of single inactive tap can be written as,

 (39)

From the above equation, it can be observed that for having , the controller constant has to be chosen very carefully. By using and EMSE of ZA-APA can be calculated.

#### Iii-B1 ρ Range

For a system with given length and projection order , , if and only if

 ρ2≤8π(L−K)2(1−μβi)2μ3β2iξ0E[1r2]C (40)

where

This bound shows the dependence of regularization parameter value on the projection order .

### Iii-C Cross Excess Mean Square Error Analysis of ZA-APA and APA Algorithms

The cross EMSE of the proposed combination can be written in the following form,

 Jex,12(n)=E[˜wT2(n)R˜w1(n)]=Tr(RK12(n))=Tr(ΛVTK12(n)V) (41)

where .

By defining the diagonal elements of the transformed covariance matrix as for . That is

 [VTK12(n)V]ii=% \emph{v}TiK12(n)\emph{v}i=˜λ12,i(n) (42)

With the above notation cross EMSE of the combination can be written as follows:

 Jex,12(n)=L∑i=1λi˜λ12,i(n) (43)

Post multiplying by , taking expectation, and using assumptions A2) and A3) we get

 (44)

With the above notation, the pre and post multiplication of (44) by and respectively, results in

 ˜λ12,i(n+1)=˜λ12,i(n)−μE[\emph{v}Ti(∑k∈Kn\emph{v}k\emph{v}Tk)]K12(n)\emph{v}% i−μ\emph{v}TiK12(n)E[(∑k∈Kn\emph{v}k\emph{v}Tk)\emph{v}i]+μ2E[\emph{v}Ti(∑k∈Kn\emph{v}k\emph{v}Tk)K12(n)(∑m∈Kn\emph{v}m\emph{v}Tm)\emph{v}i]+μ2ξ0E[1r2]E[\emph{v}Ti(∑k∈Kn\emph{v}k\emph{v}Tk)\emph{v}i]+ρE[\emph{v}Ti(IL−μ∑k∈Kn\emph{v}k\emph{v}Tk)]E[˜w1(n)sgn(wT2(n))]\emph{v}i (45)

From the orthonormality of ’s and substituting , Eq. can be written as,

 ˜λ12,i(n+1)=˜λ12,i(n)[1−μ(2−μ)βi]+μ2ξ0E[1r2]βi˜λ1,i(n+1)+ρ[1−μβi]E[\emph{v}Ti(˜w1(n)sgn(wT2(n)))\emph{v}i]Δ12,i(n) (46)

In steady state i.e., as , the term becomes time invariant. Therefore, we have

 ˜λ12,i(∞)=μ2−μξ0E[1r2]˜λ1,i(∞)+ρ(1−μβi)μ(2−μ)βiE[\emph{v}Ti(˜w1(∞)sgn(wT2(∞)))\emph{v}i]Δ12,i(∞) (47)

Using the same approximations used in EMSE analysis of ZA-APA, we can solve the term in the above equation for white and correlated input. However, to derive the closed form expressions for both active and inactive coefficients, we are assuming that the algorithms are operating on white input.

Using the fact that , for every one can write,

 Δ12,i(∞)=ρ(1−μβi)μ(2−μ)βiE[˜w1,i(∞)sgn(w2,i(∞))]=ρ(1−μβi)μ(2−μ)βiE[˜w1,i(∞)sgn(wopt,i(∞))]=0 (48)

and using Price’s theorem for every we can write,

 (49)

From , for every the mean square deviation can be written as,

 ˜λ12,i(∞)=μ2−μξ0E[1r2]˜λ1,i(∞) (50)

and for ,

 [μ(2−μ)βi]˜λ12,i(∞)=μ2ξ0βiE[1r2]−ρ(1−μ