Statistical Analysis of a GSC-based Jointly Optimized Beamformer-Assisted Acoustic Echo Canceler

Statistical Analysis of a GSC-based Jointly Optimized Beamformer-Assisted Acoustic Echo Canceler

Abstract

This work presents a statistical analysis of a class of jointly optimized beamformer-assisted acoustic echo cancelers (AEC) with the beamformer (BF) implemented in the Generalized Sidelobe Canceler (GSC) form and using the least-mean square (LMS) algorithm. The analysis considers the possibility of independent convergence control for the BF and the AEC. The resulting models permit the study of system performance under typical handling of double-talk and channel changes. We show that the joint optimization of the BF-AEC is equivalent to a linearly-constrained minimum variance problem. Hence, the derived analytical model can be used to predict the transient performance of general adaptive wideband beamformers. We study the transient and steady-state behaviors of the residual mean echo power for stationary Gaussian inputs. A convergence analysis leads to stability bounds for the step-size matrix and design guidelines are derived from the analytical models. Monte Carlo simulations illustrate the accuracy of the theoretical models and the applicability of the proposed design guidelines. Examples include operation under mild degrees of nonstationarity. Finally, we show how a high convergence rate can be achieved using a quasi-Newton adaptation scheme in which the step-size matrix is designed to whiten the combined input vector.

1Introduction

echoes arise in hands free communications when a microphone picks up both the signal radiated in a direct path by a loudspeaker and its reflections at the borders of a reverberant environment. Acoustic echoes tend to degrade intelligibility and listening comfort [1]. Modern solutions incorporate adaptive echo cancellers. However, typical room reverberation times require adaptive acoustic echo cancelers with very long responses [2]. Also, signal contamination by speech from other talkers, noise and their reflections in the acoustic environment make it difficult to obtain fast convergence and satisfactory echo cancellation with such long cancelers [3]. Moreover, conventional acoustic echo cancellation also requires a complex control logic to avoid divergence during double-talk periods [6]. Very few studies consider the adaptation during those periods. A recent work [8] proposes the use of blind source separation techniques. Though promising, such technique still lacks computationally efficient solutions.

Assuming it is possible to estimate the direction of arrival (DOA) of the desired speaker, spatial filtering (beamforming) can help attenuate interfering signals in other directions than the desired one. Beamformers (BFs) have limited echo suppression capacity due to limits in the array directivity [9] and the large number of microphones necessary to suppress all reflections outside the desired DOA [10].

Acoustic echo cancellation solutions in which BFs and acoustic echo cancelers (AECs) have complementary functions have raised a lot of interest recently [11]. BFs and AECs contribute by different means to reduce the residual echo. Hence, using both techniques in a synergistic way can improve the acoustic echo cancellation performance [22]. BFs and AECs are usually combined by means of two basic structures [23]. The AEC first structure (AEC-BF) employs one AEC per microphone [19]. The BF then processes the AEC outputs for spatial filtering. It requires several long AECs, leading to very high computational costs [19]. Moreover, signals outside the desired DOA must be treated as double talk, complicating the design. The BF first (BF-AEC) structure does the spatial filtering first, leaving basically the echo in the desired DOA to be canceled by a single AEC [16]. This structure presents a significantly lower computational complexity when compared to the AEC-BF structure, even considering that the BF impulse response adds to the length of the response to be identified by the AEC [11]. However, as a single AEC has to cancel echoes arriving at many microphones and its desired signal is affected by the BF state, the plant identification model is not valid. Therefore, previous theoretical work has to be used carefully when this structure is studied. In addition, since the AEC solution depends on the BF state, an abrupt change in the desired DOA can lead to a degraded performance until the AEC tracks the new solution.

Alternative structures that have been proposed include the use of polynomial approximations in delay-and-sum beamformers [24], the Transfer-Function Generalized Sidelobe Canceler (TF-GSC) [26], AEC sub-modeling [30], mutually exclusive adaptation of the BF and AEC [31] and wave-domain filtering [32].

Optimization of BF-assisted acoustic echo cancellation systems can be based on different performance surfaces, depending on how the BF and the AEC are optimized. One may define the beamformer performance surface from its own local error [12] or use a joint optimization scheme [14] in which the global cancellation error is used to jointly optimize the BF and the AEC. The joint optimization scheme was first proposed in [14]. It was later applied to a robot speech recognition system [15]. Joint BF-AEC optimization leads to an optimal solution with better echo cancellation performance than separate BF and AEC optimizations [22].

Despite the possibilities of combined BF and AEC acoustic echo cancellation systems, we find only few analyses of their transient behavior in the literature. The AEC-BF structure has been studied for the acoustic echo cancellation problem in [19] and for the acoustic feedback cancellation in [34]. A stochastic model has been derived using the power transfer function method for the case of a fixed BF, where just the AEC is adapted. More recently, the transient behavior of a system where a direct-form BF and an AEC are jointly adapted using equal and fixed step-sizes was analyzed in [17]. The derived analytical model was shown to accurately predict the adaptive system behavior and corroborated previous experimental findings that the same cancellation performance of a single-microphone AEC can be achieved with a shorter AEC when the possibility of spatial filtering is available [35]. The model, based on the equivalence to a coventional Linearly Constrained Minimum Variance (LCMV) optimization, allows the use of previous analytical results [36].

Adaptive LCMV beamforming may be implemented in many different forms and by using different algorithms [36]. The direct and GSC forms are equivalent in that both lead to the same optimal solution [44]. For some algorithms and under specific conditions they are equivalent even in their transient behavior [38]. Both forms tend to have comparable computational complexities for a small number of constraints. However, the GSC form offers greater design flexibility due to the possibility of choosing the block matrix. Good choices may lead reduced computational complexity [38]. Also, robust GSC implementations with an adaptive bock matrix have been proposed to account for small changes in the desired signal DOA [46]. Therefore, it is of interest to study the behavior of the GSC form of the BF-AEC structure.

This work extends the analysis in [17] to the study of the transient behavior of the jointly optimized BF-AEC structure in the GSC form. We formulate the joint optimization as a single constrained optimization problem, what simplifies the statistical analysis. Moreover, the analysis incorporates the case of a positive-definite step-size matrix [50]. The incorporation of this extra flexibility to the model is particularly interesting for BF-assisted echo cancelers, as their AEC adaptation control logic stops AEC adaptation during double-talk periods [6], while the BF continues adapting using a Reference Signal Based (RSB) structure [55] with the AEC output as the reference signal. The problem of designing an adaptive filter with step-size matrices was studied in [50]. An exponential model for the echo channel and information on the room reverberation time were exploited in [56] to design a step-size optimized algorithm. In [54], it was shown that LMS algorithm with a step-size matrix is equivalent to the classical LMS algorithm in a transformed space. The same idea is used in our convergence analysis. The analytical model derived in this paper allows the study of the echo canceler behavior including echo-only periods, when AEC adaptation is slower, double-talk periods when only the BF is adapted, and periods after channel changes when fast AEC adaptation is required [6].

The main contributions of this paper are:

1. The formulation of the jointly optimized BF-AEC implemented in the GSC form as an LCMV-based GSC. This signal model can be used to design the conventional LCMV-based GSC without loss of generality. Previous theoretical results show that the behavior of the GSC can be studied from the direct form when adaptation uses a single step-size, feasible quiescent solutions and blocking matrices have orthonormal columns [38]. Hence the analysis can also be used to design the BF-AEC and conventional LCMV implemented in the direct form using a scalar step-size generalizing the analysis in [18];

2. Incorporation of a step-size matrix. AEC adaptation control logic demands the adaptation of the AEC and BF with different step-sizes during different adaptation scenarios (double-talk, channel changes, tracking, etc) [6]. Hence, a novel analysis capable of predicting the transient behavior during different control logic states (different step-sizes) is of undisputable practical relevance. The analysis model uses a positive-definite step-size matrix

Using the proposed formulation, we derive a statistical model of the behavior of the BF-AEC system implemented in the GSC form with a positive-definite matrix step-size. The model also allows the derivation of a high convergence rate algorithm based on a quasi-Newton adaptation scheme in which the step-size matrices are designed to whiten the combined input vector to accelerate convergence.

This paper is organized as follows. Section II formulates the problem addressed. Up to Section II-C the material is basically the same as in [18] and is necessary to establish the notation used in the rest of the paper. Section II-D introduces the GSC formulation for the problem studied. Section III describes the analysis structure that allows the analysis of the adaptation using different step-sizes and the quasi-Newton algorithm using the same mathematical framework. Section IV describes the adaptive solution. Section V derives the statistical model for the adaptive solution. The statistical model convergence is analyzed in Section VI. Based on the results in section VI, the new quasi-Newton adaptation is derived in Section VII. Section VIII validates the proposed model using simulation examples. Finally, conclusions are presented in Section IX. In this paper, plain lowercase or uppercase letters denote scalars, lowercase boldface letters denote column vectors and uppercase boldface letters denote matrices.

2Problem Formulation

Figure 1 shows the BF-AEC structure with echo impulse response vectors of length , microphone signals , one adaptive wideband beamformer composed of filters of length and an adaptive AEC filter of length . We assume impulse responses constant and stationary signals for mathematical tractability [5]. The analysis for a time variant echo path becomes specially challenging in this case even for the simple random walk system nonstationarity model [5]. This is because a time variant loudspeaker-enclosure-microphone (LEM) model would lead to a nonstationary beamformer input signal. Moreover, the statistically independent increments to the channel response vectors due to the random walk model would be time-correlated by the BF filters. This would render the analysis too complex even for such simple nonstationarity model, making it very hard to study fundamental properties of the algorithm behavior. The study for nonstationary input signals requires a specific model for the input nonstationarity. To the best of our knowledge, there is no generally accepted model for signal nonstationarity. On the other hand, model predictions derived under stationarity assumptions can still show tendencies of the algorithm behavior for reasonably small degrees of nonstationarity [4]. Simulation results in Section 8.3 will illustrate that this is the case for the present study.

It has been conjectured that the spatial filtering realized by the BF reduces the required AEC length, as compared to the conventional finite impulse response (FIR) AEC structure [35]. Hence, our analysis considers the possibility of an AEC shorter than the LEM impulse responses by admitting .

2.1The Beamformer Input Vector

Each of the LEM impulse responses , , models the transmission of the far-end signal from the speaker to one of the microphones. The adaptive wideband beamformer is composed by FIR filters with impulse responses , , each of length [57]. The echo signal at the th microphone is given by [1]

where

is the LEM plant input vector.

Grouping the LEM responses as columns of the matrix

and defining the echo snapshot vector as

The th microphone signal is the sum of a near-end signal and an echo :

Each signal is composed of local speech, local interferences and random noise. We define the microphone array snapshot as the vector composed by all :

Then, combining , and yields

where is the near-end signal component snapshot.

We now define the extended far-end sample vector as

where the dimension of is the length of the convolution of and . Then to express the microphone array input signals (the echo signals) as functions of we rewrite as

where denotes the null matrix with dimension lines and columns. Then, defining the stacked echo vector

we can write

where

is the modified echo channel matrix. Note that contains the echo signals for the time window corresponding to the length of the BF impulse response.

Using , , and , and defining the near-end vector component (without echo) as

and the combined beamformer input regressor as [36]

we write the beamformer input vector as

2.2The Residual Echo

Define the vector of the th components of all vectors , , at time as

We then write the beamformer output as

Now, defining the stacked beamformer weight vector

we can write as the inner product

Next, defining the AEC weight vector

and the AEC input vector

we can write the AEC output as

Using and we write the residual echo as the inner product

3The Analysis Structure

With the problem formulation presented in Section 2, we can define an analysis problem that corresponds to the study of a single GSC structure that combines the beamformer and the AEC adaptations. To this end, we define the stacked input vector

and, from and , the stacked coefficient vector

Then, we can write the residual echo as the inner product

This simple model will permit to relate the study of the BF-AEC structure to that of the LCMV problem.

Interestingly, input vectors in and in are related by

where we use the notation to denote the identity matrix. Hence, and permit to write in as a function of the input vectors and . Equation also allows to study the algorithm performance for , and thus verifies the possibility of reducing by increasing the number of microphones.

3.1Performance Surface

The mean output power (MOP) performance surface is defined as the mean value of conditioned on . From ,

where is the input autocorrelation matrix. A set of linear constraints on the beamformer coefficients implements the spatial filtering. Usually, an constraint matrix and an response vector jointly define the frequency response in the desired DOA [36].

To formulate the linear constraints as a function of the combined coefficient vector, we define the extended constraint matrix [14]

Finally, the joint optimization problem can be formulated as

and the optimal solution is given by [36] .

3.2Implementation using the GSC Form

In the GSC form [38], the dashed square of Figure 1 is replaced by the dashed square of Figure 2.

Feasible solutions to are decomposed as [38]

where is any feasible solution to , is a full column-rank -dimensional blocking matrix orthogonal to (), is an -dimensional vector and . The minimum norm solution to is

Optimal Solution

As , in satisfies for any , and becomes an unconstrained optimization problem in with solution [41]

where denotes the blocked input autocorrelation matrix, and from

Defining the cost function of

its gradient with respect to is

Setting equal to the null vector yields [41]

To obtain a model flexible enough to allow the study of the system performance with independent BF and AEC adaptations we choose the following block diagonal form for the blocking matrix

and split

where , and , . The same block matrix structure has been used in [14] for the implementation of the GSC-based BF-AEC acoustic echo canceler.

Using in and noting from that we have

Splitting the gradient vector according to yields

where, from , and

Additionally, setting and using and in yields

where .

Comparing and we conclude that and . Hence, the steepest-descent algorithms for and with the gradients in and respectively are

where and are the step-size parameters. Note that is different from the steepest descent algorithm for unless . This also makes this analysis different from [18] by using the equivalence derived on [38]. However, this extra degree of flexibility is necessary to analyze the behavior of the BF-AEC system under different control logic states that usually act on to avoid divergence. The stochastic approximations of and yield

Implementation of has almost the same computational cost of the separate implementation of an LMS implementation of an AEC and a BF demanding only an extra subtraction in the computation of . It also requires only one extra memory allocation to account for the second scalar step-size. Despite its simplicity, can model the BF-AEC system behavior under most control logic states. Implementation of is shown in Figure 3.

Finally, the recursive weight update equation is obtained defining the diagonal step-size matrix

then can be written as

Note that has the exact same behavior of , which can be used to study the performance of the practical implementation.

In the following we perform the analysis of an even more general form of , in which the only requirements on and are that is symmetric positive-definite and is a full column-rank matrix that satisfies . The typical implementation described above will correspond to a particular case of the more general analysis.

4.1Weight Error Vector

Define the weight error vector . From ,

where

denotes the weight error vector of the unconstrained filter conditioned on and . From , is in the range of . Hence, is completely determined by conditioned on . We then study the behavior of .

Subtracting from both sides of , using with and we obtain a recursive update equation for :

5Statistical Analysis

5.1Simplifying Assumptions

We now study the behavior of BF-assisted GSC-form echo canceler using under the following typical simplifying assumptions required for mathematical tractability [5]

A1

is a zero-mean Gaussian vector;

A2

and are statistically independent;

A3

is positive-definite and both and have full column rank;

A4

The statistical dependence between and can be neglected;

A5

The DOA does not change during adaptation.

Though not always valid in practice, these assumptions make analysis viable and frequently lead to results that retain sufficient information to serve as reliable design guidelines [5], [12]. Simulation results will confirm their reasonability for this analysis. A1 simplifies the evaluation of fourth order moments of . These moments are dependent on the distribution of , and the Gaussian distribution combines the advantages of being a good model for several physical processes and simplifying the required mathematical derivations. A2 is physically reasonable, as and are generated at different sides of the communications channel by independent speakers. A3 is reasonable in practice, as always has some uncorrelated noise component and both and are under reasonable control of the designer. A4 is required to estimate moments involving the input signal and the weight vector, as the statistical distribution of the latter is unknown. This assumption is in fact less restrictive than the usually employed independence assumption, which requires and to be independent, as discussed in detail in [58]. A5 is employed for mathematical tractability and because the main goal of the present analysis is to determine fundamental properties of the adaptive system.

5.2Mean Weight Error Vector Behavior

Taking the expected value of under A4 and using and leads to

since

Hence, the mean weights converge asymptotically to the optimal solution if all eigenvalues of are inside the unit circle. In this case, results in asymptotically unbiased solutions in the mean.

5.3Mean Output Power (MOP)

To determine the MOP we use with and . Defining we obtain

where we have used and A4 to obtain the second line. A recursive expression for the matrix is derived in the next section to complete the model .

5.4Correlation Matrix of ϑ[n]

Post-multiplying by its transpose, taking the expected value, using A1–A5 and yields

Using A1, A4 and the Gaussian moment factoring theorem, the expectation in is given by

Finally, substituting into yields

Equation completes the MOP model in .

6Convergence Analysis

Classical convergence analysis of would project into the eigenspace of and study the convergence of the diagonal entries of the transformed matrix [4]. The presence of , however, requires a different approach. As , is not entirely diagonalizable by the same projection [59]. Nevertheless, it is still possible to diagonalize both and through contragradient diagonalization [59],[60]. As is positive definite, Cholesky decomposition yields with non-singular. Then, we can transform the vector space into , and

Hence, pre-multiplying by , post-multiplying by and using yields

where is symmetric and positive definite. Hence, it is diagonalizable as with and

Pre-multiplying by and post-multiplying by yields

where .

is an autocorrelation matrix. Then , [61], [62], and convergence of can be studied observing only the diagonal elements of . Let be the vector of diagonal entries of and be the vector of the eigenvalues of . Then, from

and

where ,

and .

The matrix is symmetric and positive definite, as for any nonzero vector we have

The solution to is [63]

Using we now study the stability conditions and the steady-state behavior of .

6.1Mean Weight Error Revisited

Pre-multiplying by the mean weight error recursion is transformed to [51]

Hence, analysis can be restricted to the eigenspace of defined in .

6.2Stability Conditions

Recursion is a state-space equation whose stability is determined exclusively by the eigenvalues , , of [63]. From Gershgorin’s theorem [64],

and is stable if for all . Then, leads to the sufficient condition

which implies that and

In most practical cases, a reliable estimate of the eigenvalues of is not available a priori and the upper bound in can not be used. However, using the inequality

it is possible to derive a tighter upper bound [65]

In the particularly important implementation using , is given by and by . Hence, if we write with the matrices in the partitioned form using , , and yields

where and . Hence, becomes

6.3Excess MOP

Using in we write the MOP as a function of and

Thus, the excess MOP is given by [5]

When holds, and from and we have

which solved for yields

Using this result in as and solving for yields

From , and A3, is symmetric and positive definite. Hence, its largest eigenvalue is related to its largest singular value through where denotes the vector of singular values of . The largest singular value of a matrix is equal to its -norm [67]. Then, using the Cauchy-Schwarz inequality [60]

where both and are symmetric positive definite matrices. Hence, for , where and are vectors containing the eigenvalues of and , respectively, we conclude that and reduces to

For implementations using , substituting in yields

Further assuming we have