Technical Report: A New Decision-Theory-Based Framework for Echo Canceler Control

Technical Report: A New Decision-Theory-Based Framework for Echo Canceler Control

Tales Imbiriba, , José Carlos M. Bermudez, ,
Jean-Yves Tourneret, , and Neil J. Bershad, 
Jean-Yves Tourneret is with the University of Toulouse, IRIT - ENSEEIHT - TéSA, 2 rue Camichel, BP 7122, 31071 Toulouse cedex 7, France, e-mail:Jean-Yves.Tourneret@enseeiht.fr.
José Carlos Bermudez and Tales Imbiriba are with the Dept. of Electrical Engineering, Federal University of Santa Catarina, Florianópolis, SC, Brazil, e-mail:j.bermudez@ieee.org, and talesim@gmail.com.
Neil J. Bershad is with the Dept. of Electrical Engineering and Computer Science, University of California Irvine, Irvine, 1621 Santiago Drive, Newport Beach, CA 92660, U.S.A., e-mail:bershad@ece.uci.eduCorresponding author: J.-Y. Tourneret.
Abstract

A control logic has a central role in many echo cancellation systems for optimizing the performance of adaptive filters while estimating the echo path. For reliable control, accurate double-talk (DT) and channel change (CC) detectors are usually incorporated to the echo canceler. This work expands the usual detection strategy to define a classification problem characterizing four possible states of the echo canceler operation. The new formulation allow the use of decision theory to continuously control the transitions among the different modes of operation. The classification rule reduces to a low cost statistics for which it is possible to determine the probability of error under all hypotheses, allowing the classification performance to be accessed analytically. Monte Carlo simulations using synthetic and real data illustrate the reliability of the proposed method.

{keywords}

Adaptive filters, adaptive signal processing, adaptive systems, echo cancellation, channel change, double-talk, classification, multivariate gamma distribution.

I Introduction

Echo cancellation is a requirement in modern voice communication systems. Speech echo cancelers (ECs) are employed in telephone networks (line echo cancelers) or in hands-free communications (acoustic echo cancelers). Most EC designs include two main blocks; a channel identification block and a control logic block. The channel identification block tries to estimate the echo path, often employing adaptive filtering. However, the adaptive algorithm tends to diverge in the presence of near-end signals (double-talk – DT). Hence, adaptation must be stopped during DT. On the other hand, abrupt echo channel changes (CC) require a faster adaptation to improve tracking. Finally, in the absence of both DT and CC, a slow adaptation rate tends to improve channel estimation accuracy. The control logic is then required to control the transitions among these distinct modes of adaptive operation.

Numerous approaches have been proposed to deal with DT or CC in echo cancelers. Some approaches have been proposed which do not require a DT detector, aiming at a continuous adaptation of the EC. These methods may employ signal correlations [benesty2000new], independent component analysis (ICA) [Gunther2009, Gunther2015], a continuous step size adjustment (variable step size) [Mader2000, GilCacho2014_2, Yang2017], or a frequency domain approach [GilCacho2014]. However, eventual benefit of avoiding DT detection usually comes at the expense of decreased convergence rates [GilCacho2014_2], and the need for additional information about loudspeaker-microphone coupling [Yang2017], and about the near-end signal statistics [GilCacho2014]. Moreover, all such methods require extra estimation strategies resulting in extended memory usage and computational complexity, or significantly simplified implementations for practical applications.

Several works have proposed methods for DT detection in ECs without considerations regarding CC, such as [Ochiai1977, benesty2000new, schuldt2012delay, ikram2015double]. However, DT detection strategies that assume a static channel response may yield unpredictable performances in the presence of CC [ahgren2006study].

The vast majority of the techniques available for DT or CC detection rely on ad hoc statistics to make the decision, leading to cumbersome design processes. A few works employ a statistical framework to formulate the detection problem. For instance, [Carlemalm1996] proposes a maximum a posteriori (MAP) decision rule based on channel output observations and assuming Bernoulli distributed priors for the different hypotheses. A similar approach is used in [Carlemalm1997], but employing a Markov channel model. In [Fozunbal2005], a generalized likelihood ratio test (GLRT) is proposed using observations from both the channel input and output signals. DT and CC detection are considered. In [jung2002new] and [jung2005], a first test distinguishes single-talk from DT or CC, and a second test based on the echo path estimate detects DT. Though the latter two studies consider DT and CC in a single formulation, all these aforementioned statistical formulations have been proposed for the conventional adaptive EC structure [Mader2000].

An alternative EC structure has been proposed in [Ochiai1977], which uses a shadow adaptive filter that operates in parallel with the actual echo cancellation filter. The shadow filter coefficients are transferred to the echo cancellation filter when the shadow filter is a better estimate of the unknown channel response than the echo cancellation filter. From the authors experience, this structure allows a much better control of the EC convergence than the conventional structure. The EC structure is shown in Fig. 1. The EC consists of the main echo cancellation filter and the adaptive shadow filter. The output of the main filter is subtracted from the echo to obtain the canceled echo . The shadow filter weights are adapted continuously. The control logic is designed such that the shadow filter coefficients are copied to the main filter when this will improve the EC performance. A likelihood ratio test (LRT) detector based on the EC structure in Fig. 1 was derived in [BershadTourneret2006] to detect DT versus CC. A generalized LRT (GLRT) that could be simplified to a sufficient statistic was proposed for the same EC structure in [TourneretBershadBermudez2009]. The performance of the test statistic was evaluated as a function of the system parameters. The idea developed in [BershadTourneret2006] and [TourneretBershadBermudez2009] was to use the detection result 1) to stop adaptation when DT was detected and 2) to adapt fast in the presence of channel change. The speed of adaptation was controlled by the adaptation step size.

The decision theory-based DT and CC detection formulation in [BershadTourneret2006, TourneretBershadBermudez2009] did not include decision theory based formulations for the exit from a DT or a CC condition. These decisions were still made in an ad hoc manner.

This paper formulates the echo canceler control logic as a more general classification problem, with four hypotheses associated to the presence or absence of DT and to the presence or absence of CC

(1)

There are several motivations for identifying these four classes. These motivations include 1) the possibility to adjust accurately the step size of the adaptive filter for long time intervals when there is no DT and no CC, resulting in smaller residual errors, 2) the inclusion of adds an important degree of flexibility to the control logic that can be exploited, as will be shown in Section V-A, 3) these four classes lead to a simple and low cost test statistic.

The paper is organized as follows. In Section II we introduce the signal models and derive the classification rules. In Section III we present the performance analysis of the proposed classifier. Monte Carlo simulations are presented in Section IV to validate the theory. Section V discusses application of the proposed classification strategy and presents illustrative simulation results. Finally, Section VI discusses the results and presents the conclusions.

Fig. 1: Basic echo canceller structure.

Ii Double-talk and Channel Change Classification

Ii-a Signal and Channel Models

The channel input vector is of dimension with covariance matrix and the channel output is a scalar . The input signal is stationary within the decision periods and the DT signal can be modelled by a white Gaussian process for detection purposes [Fozunbal2005]. Also, is modelled as a zero-mean Gaussian vector. By denoting as the adaptive shadow filter and as the main echo cancellation filter, the channel output can be expressed as follows under the different hypotheses

(2)

The hypothesis considers that has converged and has been recently copied to . Hypothesis assumes that has already converged after a channel change. In , a DT signal happens after convergence of and copy to (similar to ). Finally, a fourth hypothesis considers that DT happens following a CC after has already converged to the new channel but has not yet been copied to . All cases rely on the convergence (or divergence) of and its relation to resulting in several practical implications concerning the control logic block in Fig. 1. The control strategy will be addressed in Section V.

The additive noise is stationary zero-mean white111Note here that the whiteness assumption for is not restrictive since it is always possible to whiten the channel outputs by pre-multiplying consecutive samples by an appropriate matrix. Of course, this operation assumes that the covariance matrix of consecutive noise samples is known or can be estimated. Gaussian, independent of with . The second additive noise , modeling the DT, is zero-mean white Gaussian, and independent of both and with . Two error signals and were introduced in [TourneretBershadBermudez2009] to facilitate the analysis. These error signals can be expressed as follows under the different hypotheses

(3)

Ii-B Classification Rule

Ii-B1 One-Sample Case

The joint pdf of is Gaussian under all hypotheses such that

(4)

where the second subscript in ( in this case) indicates the -sample case. The covariance matrices of under the different hypotheses can be written

(5)
(6)

with

(7)

where can be interpreted as the power at the output of the difference filter with response . Assuming all hypotheses are equiprobable, the classification rule minimizing the average probability of error decides hypothesis is true when

(8)

for all . Equivalently, hypothesis will be accepted if

(9)

for all . Straightforward computations (detailed in Appendix A) allow one to compute the inverses and determinants of the matrices and yielding the following classification rule

(10)

where “aif” means “accepted if” and

(11)

The different decision regions corresponding to (10) are illustrated in the plane in Fig. 2.

Fig. 2: DT and CC Decision Regions in the Plane.

Ii-B2 Multiple Samples

The analysis above can be generalized to the case where multiple time samples , for , are available. The analysis is performed here for two samples (i.e., ) for simplicity and is generalized later. When two samples are observed, the error signals and are considered. They can be expressed as follows under the different hypotheses:

Under :

(12)

Under :

(13)

Under :

(14)

Under :

(15)

Defining , is a zero-mean Gaussian vector under all hypotheses. Straightforward computations yield the covariance matrices of under the different hypotheses. These matrices can be expressed as

(16)
(17)

where is the identity matrix and is given by Equation (18).

(18)

In (18), , , and . The determinants and inverses of these block matrices can be computed following [Kay1993, p. 572]

(19)

and

(20)
(21)

where is assumed to exist.

Performing the same computations shown in Appendix A for vector and matrices (16) and (17), the following multiple sample classification rule can then be obtained

(22)

where , and

(23)

The factor multiplying in (23) results from . This result can be compared with (10) obtained for the one-sample case. The generalization to more than two samples is straightforward. Indeed, in the -sample case, the covariance matrices of are defined as in (16) and (17), with replaced with , and defined differently. However, since cancels from the difference between the two inverses, the classification rule for the -sample case is expressed by (22) with the squared norm of , , and with replaced with .

Iii Performance Analysis

This section studies the probability of classification error for the classifier proposed in Section II.

Iii-a One-Sample Case

It is clear from the classification rules (10) that is a sufficient statistic for the classification problem. Interestingly, the exact distribution of can be derived under all hypotheses, allowing for an analytical study of the classifier performance. First, we note that the elements of form the diagonal of the matrix . Now, since is jointly distributed according to a zero-mean Gaussian distribution with covariance matrix , see (4), it is shown in Appendix B that, under all hypotheses , , is distributed according to a multivariate gamma distribution denoted with shape parameter and scale parameter , with

(24)

where , and are the elements of the covariance matrix .

Iii-B Multiple-Sample

Once again it is clear that the vector is a sufficient statistic for solving the proposed classification problem. Noting that is a rearrangement of the vectors , , the distribution of can be obtained following the reasoning presented in Appendix B, under the assumption of independence of vectors and , , and stationarity for . Assuming the vectors , , to be distributed according to the same zero-mean Gaussian distribution with covariance matrix , matrix is distributed according to a Wishart distribution with degrees of freedom [muirhead2005, Theorem 3.2.4, pg. 91]. Thus, is distributed according to a multivariate gamma distribution with shape parameter and given by (31).

Iii-C Probability of Error

To simplify the notation, define and such that . Also consider to be the bivariate gamma density associated with hypothesis . Then, the probability of error , can be computed as:

(25)

where represents the integration limits associated with . A detailed expansion of (25) for all classes is presented in Appendix C. The integral (25) was implemented using MATLAB function integral2.m.

Figures 35 show the probabilities computed using (25) as functions of for different sets of parameters. Each row of these figures corresponds to a given true hypothesis , . Figure 3 shows for , , and . These plots clearly show that the performance of the classifier improves by increasing or . A large value of is especially important in distinguishing between hypotheses and . It is also clear that the classification error increases significantly for low values of . As a limiting situation, the vector will be placed exactly on the line separating the classes and , or and (see Fig. 2) for .

Since yielded good classification performance, we opted for fixing in Figs. 4 and 5, while varying the DT power in Fig. 4 and the noise power in Fig. 5. Although the DT power has little influence on the classifier performance under and hypotheses (Fig. 4), a clearer influence is observable under and . In this case, increasing the DT power tends to increase and (bottom two rows of Fig. 4). This behavior is expected as the effect of a channel change in distinguishing between hypotheses and diminishes with the increase of DT power. Figure 5 explores the effect of the noise power on the classifier performance. It can be noted that a large noise power increases the probability of error in detecting the onset of DT (, , , ), as the performance is a function of the DT to noise ratio . This effect, however, is very small for ratios larger than 3dB, which is typical in practice. Simulations for the one-sample case with different noise and DT powers are available in Figs. 6 and 7 respectively. Although the results obtained for the one-sample case show (as expected) a stronger influence of DT and noise power in the classification performance when compared to the results for , they corroborate the above conclusions.

Fig. 3: Theoretical performance curves for single- and multi-sample cases (, ).
Fig. 4: Theoretical performance curves for different values of DT power (, ).
Fig. 5: Theoretical performance curves for different values of noise power (, ).
Fig. 6: One-sample case, different values, .
Fig. 7: One-sample case, different values, .

Iv Monte Carlo Simulations

In this section Monte Carlo (MC) simulations are performed and compared with the theoretical expressions derived in the previous section. These results are also valuable to assess the effect of the independence approximation on the analysis accuracy.

To generate the statistics by sampling the -dimensional vectors from , we need to define the covariance () and correlation () matrices. Considering the input signal to be auto-regressive of order 1 (AR-1), was chosen as follows [BershadTourneret2006]:

(26)

where controls the input signal correlation. Thus, the entries of can be written as

(27)

Note that by fixing the vectors and , depends only on , and . Thus, for a given , can be easily computed using (26) and (7). The vectors and were assumed to have 1024 samples, and were constructed using the one-sided exponential channels (see [BershadTourneret2006] and [TourneretBershadBermudez2009])

(28)

where is a relative delay of the channel and the parameter is defined by the filter gain . Two different scenarios are studied here corresponding to dB (electrical application) and dB (acoustic application).

Figure 8 presents the MC simulations obtained by averaging runs for dB, with varied in the range , , , and , leading to an SNR of 30dB. Figure 9 presents the same simulation for dB. When comparing Figs. 8 and 9 with theoretical results (Fig. 3), only a very small degradation in classification accuracy is noted, mainly for and , and . This small difference is attributed to the use of the independence approximation.

Fig. 8: MC performance curves assuming AR-1 input signal, sampled from , dB (electric application), , , .
Fig. 9: MC performance curves assuming AR-1 input signal, sampled from , dB (acoustic application), , , .
Fig. 10: MC performance curves assuming AR-1 input signal, , sampled from , , dB (electric application), , , .
Fig. 11: MC performance curves assuming AR-1 input signal, , sampled from , , dB (electric application), , , .

MC simulations for different values of the correlation coefficient are shown in Figs. 10 (dB) and 11 (dB). Although varying has little impact on the classification performance, it is interesting to notice that increasing slightly improves the classification performance in all classes, but especially for and .

V Application to echo cancellers

V-a Control strategy

The classification hypotheses presented in (2) considered that in each case the adaptive filter had time to converge or diverge. This becomes a critical point for designing the control block (see, Fig. 1) since the probabilities of error are high for low values of . Two direct consequences related to this characteristic are the following:

  1. (): Whenever is copied to becomes zero and the probability of error becomes large between classes and . In fact, if the vector will be exactly in the frontier between the two classes (see, Fig. 2).

  2. (): When CC happens, and may assume values very far from the new true filter response . If this is the case, classification errors ( or ) are expected since both norms , , may become larger than .

To address these problems, we propose a control strategy that combines tuning of the adaptive step size , defining an appropriate frequency for the realization of the tests, and introducing a delay before actually changing the system state after each decision.

Adaptation step

The shadow filter is always adapting, even during DT, since the difference between and is crucial for improving classification rates. However, different adaptation step sizes can adopted for each class:

  1. During , should be low since the aim is to make a fine tuning of the filter coefficients.

  2. During , should be set as high as possible to speed-up convergence of the adaptive algorithm.

  3. During , should be set to a small value so that can diverge slowly under DT, start to converge once DT is over or in the accurence of CC.

  4. Class is critical since it corresponds to the occurence of CC with or without DT signal. Our practical experience indicates that setting to a value between and leads to good classification results.

Frequency of tests

The difference filter plays a central role in classification accuracy. Hence, it is advisable to allow a minimum number of samples between two tests to allow a clear differenciation of the two responses.

Filter copy

Whenever classes or are detected, the shadow filter should be copied to if . To account for transients occurring after the exit of a given state (especially when DT stops), it is advisable to consider a delay of samples between the decision moment and the actual filter copy.

Decisions in the neighborhood of

Decision between and , and between and are rather arbitrary in practical situations when . To address this issue, we propose to allow changes between classes and , or between and only if

where .

V-B Synthetic Data

This section considers the AR-1 () data discussed in Section IV, and also used in [BershadTourneret2006, TourneretBershadBermudez2009]. We considered filter responses and with samples, and fixed the parameters , , and . The signal consisting of samples () was formulated as

(29)

with intervals , , , , and . Hence, CC occurs at sample 20,001, DT occurs between samples 80,001 and 120,000, and a second CC occurs during the DT period at sample 100,001. The adaptive algorithm employed was the Normalized Least Mean Square (NLMS) algorithm, whose maximum convergence speed is known to be attained for  [Haykin:1996:AFT:230061]. The control parameters were set to , , , , , and was set to 0.25 for . The adaptive filter coefficients were initialized equal to zero and the adaptation step was initialized as (CC). The simulation results for one realization of the synthetic signal are shown in Fig. 12. The top panel presents the classes attributed by the classifier to each sample in time. The second panel presents the step-size corresponding to each class. The squared excess errors (SE) , , for and follow in the bottom two panels. Although the good classification performance is evident in this example, the issue discussed in section V-A can be noticed after the CC at sample 20K. The samples are classified as before becomes smaller then . Then the correct class is selected before sample 30K. However, since the adaptive filter never stops adapting, this problem is satisfactorily mitigated without severe deterioration of the echo canceler performance, as can be verified by the SE results in the two bottom panels. These results clearly show the performance improvement resulting from the generalization of the approach proposed in [BershadTourneret2006, TourneretBershadBermudez2009]. The improvement shows especially during the single-talk periods. As DT or CC do not occur during these periods, the proposed solution leads to a reduction of the step size , clearly improving the quality of channel estimation. Note, for instance, that the step size reduction that happens at iteration 35K due to the acceptance of hypothesis leads to a drop in SE that reaches 12 dB at iteration 80K.

Simulations with yielded similar results, and are displayed in Fig. 13.

Fig. 12: Performance of the echo canceller system (dB). From top down, the panels present the evolution of the classification result (top), adaptation step size , SE in dB for and (bottom). , , , , .
Fig. 13: Performance of the echo canceller system (dB). From top down, the panels present the evolution of the classification result (top), adaptation step size , MSE in dB for and (bottom). , , , .

V-C Voice Data Over a Real Channel

For the simulation presented in this section we used the same voice data and channels considered in [BershadTourneret2006, TourneretBershadBermudez2009]. The data is approximately 144K samples long, with two CC’s occurring at sample 50K and 123K, and an intense DT occurring between 57-123K. The simulation results presented in Fig. 14 compare the proposed decision framework (blue) with the sequential classification strategy presented in [jung2005] (gray). To deal with the power fluctuation inherent in speech signals, we used and set the detection threshold chosen empirically to avoid and errors during DT. The remaining control strategy parameters were kept the same used in the synthetic simulation presented in Fig. 12. The parameters used for the method in [jung2005] were set to the same values used by the authors. Although the detector presented in [jung2005] also considers different classes, the authors did not consider the influence of multiple samples nor used a shadow filter configuration, which clearly impacts the results. The results displayed in Fig. 14 can be also compared with the result obtained in [TourneretBershadBermudez2009, Fig. 9], which indicates that the proposed classification and control strategies perform at least as well as previous echo cancellation systems.

Fig. 14: Performance of the echo canceller system for voice over real channels. From top down, the panels present the evolution of the classification result (top), adaptation step size , SE in dB for and (bottom). , , . Results for the proposed method (blue) and using the method in [jung2005] (gray).

Vi Results and Conclusions

In this manuscript we presented a low computational cost multi-class classifier with a coupling control strategy for the echo cancellation problem. The proposed classification rule initially proposed for one-sample was easily extended to the multi-sample scenario. Error probabilities were also analytically computed under the assumption of independence among vectors . This assumption led to bivariate gamma distributions for the sufficient statistics and performance curves that proved accurate when confronted with Monte Carlo Simulations. The results showed that the greater flexibility provided by the multi-class approach could be well explored by the control strategy which considered different step-sizes under each hypothesis. The simulations with synthetic data showed that the multi-class strategy is viable if accurate double-talk and noise power can be estimated, improving the filter convergence during long periods of single-talk. Simulations in a more realistic scenario (voice over real channels) showed that the proposed strategy works as well as other methods in the literature even ignoring the power fluctuation of speech signals and using a fixed threshold .

Appendix A Classification rule

This appendix derives the classification rule (10) for the one sample case. This rule corresponds to accepting hypothesis if

(30)

for all . As a consequence hypothesis is accepted if the three following conditions are satisfied