Compressed matched filter for non-Gaussian noise
We consider estimation of a deterministic unknown parameter vector in a linear model with non-Gaussian noise. In the Gaussian case, dimensionality reduction via a linear matched filter provides a simple low dimensional sufficient statistic which can be easily communicated and/or stored for future inference. Such a statistic is usually unknown in the general non-Gaussian case. Instead, we propose a hybrid matched filter coupled with a randomized compressed sensing procedure, which together create a low dimensional statistic. We also derive a complementary algorithm for robust reconstruction given this statistic. Our recovery method is based on the fast iterative shrinkage and thresholding algorithm which is used for outlier rejection given the compressed data. We demonstrate the advantages of the proposed framework using synthetic simulations.
Compressed matched filter for non-Gaussian noise
|Jakob Vovnoboy and Ami Wiesel††thanks: This work was supported by the Israeli Smart Grid consortium, and also in part by ISF Grant 786/11. Special gratitude to Amir Globerson|
|The Rachel and Selim Benin School of Computer Science and Engineering,|
|The Hebrew University of Jerusalem|
Index Terms— Matched filter, robust regression, compressed sensing, JMAP-ML.
One of the most fundamental concepts in parameter estimation is sufficient statistics. These are functions of the observations that summarize all the information associated with the parameter of interest. Sufficient statistics minimize the required storage and communication resources. They are task independent and are useful when the data has to be compressed for a future specific use. Their computation usually involves simple and low complexity operations that are suitable for high rate processing. A sufficient statistic for estimation of a deterministic unknown parameter vector in a linear model with Gaussian noise is the well known matched filter. This simple linear operation is a core ingredient of most radar and communication systems.
Many of the modern physical systems are better modeled as linear systems with non-Gaussian noise rather then Gaussian, this mainly due to impulsive noise phenomenons [15, 19, 4, 3, 5, 12, 10, 11]. Typical noise characteristics include generalized Gaussian distributions, mixture distributions, impulsive models, and heavy tailed models. In such scenarios, a low dimensional sufficient statistic is usually unknown, hence the classical matched filter is generally sub-optimal and more complicated non-linear operations are required .
A common estimation technique in systems with non-Gaussian noise is to use non-linear element-wise limiters (also known as clippers) prior to the linear filter . However, this method is efficient only when the dynamic range of the data is small compared to the outliers. Another traditional solution in statistics is to resort to robust regression methods which works directly with the observed data, e.g., Huber’s technique [10, 11]. This requires all the data to be stored for processing which can be quite expensive in terms of memory. In systems where the data have to be communicated for postprocessing the above is extremely limiting. Instead, the goal of this paper is to propose a compressed matched filter (CMF), namely a bank of approximate (and randomized) matched filters which compress the observations and contain most of the information in the data. Then, the output of the CMF can be easily stored or communicated for future use. In addition, we derive a compressed Huber (CH) estimator which allows to reconstruct the unknown parameter vector using this compressed data. A block diagram of CMF and CH is provided in Fig. 1. It is important to note that recovering the system parameters from the compressed data may still have the same complexity as in the uncompressed case. Therefore our main contribution is in lowering the amount of data transmitted from the receiver to the postprocessing unit.
Our framework is motivated by the theories of sparse recovery and compressed sensing (CS)[1, 8]. Compressed sensing is a technique for reconstructing a high dimensional but sparse unknown vector from a small number of linear measurements. The basic approach is to use linear measurements with randomized coefficients and a recovery algorithm based on convex optimization with an norm. Sparse impulsive noise (or outliers) can be dealt with using the same framework [13, 17]. A more advanced method known as LASSO extends the setting to noisy measurements . Recently, the problem was generalized to the estimation of a vector which is only partially sparse . All of these methods consider sparse signals and simple Gaussian noise. Interestingly, Fuchs (in ) showed that Huber’s robust regression can be expressed as the solution to a partially sparse model. The result is quite intuitive and can be interpreted as Gaussian noise contaminated by additional sparse and deterministic outliers. Fuchs had also suggested a generalization of the robust regression to the correlated noise case but did not present farther results. Our proposed systems is based on these ideas. CMF complements the classical matched filter with a few additional CS filters. CH estimates both the signal and the outliers by searching for a partially sparse solution that is consistent with the compressed data. Note that our frameworks is different than classical CS in two aspects: First, our sensing procedure is still mostly based on the matched filter. The additional randomized filter assist in detecting and eliminating the outliers. Second, unlike CS, our desired signal is dense. The sparsity is associated with the nuisance outliers.
The paper is organized as follows. We begin in Section 2 by introducing the problem formulation. In Section 3 we consider the inherent performance limitations due to the compression. These bounds are computed assuming a clairvoyant estimator that can only be approximated in practice. In Section 4, we address the choice of CMF using the theory of CS matrix design. In Section 5, we derive the CH algorithm which allows to reconstruct the unknown signal and as byproduct also part of the noise. This optimization is based on the joint Maximum a Posteriori and Maximum Likelihood (JMAP-ML) estimator  and ideas from Huber’s regression . The input to CH is low dimensional, but it processes high dimensional vectors in its internal computations. Thus, we also provide an efficient implementation of CH based on the fast iterative shrinkage and thresholding algorithm (FISTA) by . Finally, in Section 6 we illustrate the performance of our proposed methods using numerical simulations.
The following notation is used. The sets and denote the set of length vectors and the set of size matrices. The operator denotes the norm. The superscript and denotes the transpose and inverse operations. The subscript denote the ’s element in the vector . The Moore Penrose pseudoinverse of a matrix is denoted by . We denote the multivariate Gaussian distribution by where and are the mean vector and the covariance matrix.
2 Problem formulation
Consider a linear model
where is a known matrix with , is an unknown deterministic vector and is a random vector with independent elements. In the Gaussian case i.e.
It is well known that all the information in about can be compressed by a linear matched filter
where . i.e. is a sufficient statistic of . Remarkably, the dimension of is which is much smaller than , and hence the compression. This is even true in the extreme continuous case in which is infinite but the dimension of is still and depends only on the number of unknowns. Using we can infer whatever we need about without storing .
Our goal is to obtain a similar linear compression in the non-Gaussian case. Specifically, we assume that the marginal distribution of each element in is an -contaminated Gaussian model
where is a known small contamination ratio parameter, and is some symmetric distribution, typically unknown and referred to as outlier distribution. In this case, a low dimensional sufficient statistic is usually unknown. Thus, we seek an approximate compression procedure. We will design a compression matrix as in (3) of size where , that summarizes as much information on as possible. Then, given the compressed we will derive a computationally efficient algorithm for estimating the unknown parameter .
3 Performance bounds
It is instructive to begin with two simplified problems which provide inherent performance bounds and explains our methodology. For this purpose, we assume a Gaussian mixture noise distribution meaning that the outliers are distributed normally, i.e. where is a known constant, and consider oracle estimators which somehow know the locations of the outliers in . First, we consider the uncompressed case in which . Under this assumption, the conditional distribution of the observations is
where is a diagonal matrix with the variances of . Roughly, of its diagonal elements are equal to and the other elements are equal to . This is a simple Gaussian linear model and the optimal estimator is a Weighted Least Squares (WLS) 
where we use a weighted norm defined as
Its mean squared error is then given by
where the expectation is with respect to the randomness in . This is not a general performance bound, as we have assumed a specific noise distribution but it is quite close if is indeed the variance of the outliers. Any compression will probably increase the error, and our goal is to get as close as possible to this error with the smallest possible value of .
In the compressed case (again, with known locations of the outliers and Gaussian outliers), the distribution of the observations is
where we condition on both and which are statistically independent. This too is a simple Gaussian linear model solved via a WLS
Its mean squared error is given by
where the expectation is with respect to the randomness in both and . In practice, it is impossible to implement the above oracle. However, it suggests a natural two step approach: first, detect the location of the outliers, then use an approximate oracle assuming these locations are exact. Furthermore, these MSEs are reasonable performance bounds that any practical estimator should be compared to.
4 Compressed matched filter for non-Gaussian noise
The first part of our design is the choice of the sensing matrix which defines CMF. Unfortunately, it is not completely clear what is the optimal criterion for the design and/or how to numerically optimize it. On the one hand, from the signal perspective, we would like to be close to the matched filter. At the least we need to ensure that its columns span the columns of . On the other hand, we need to give some response to the noise shape. Here we can use some insights from the compressed sensing field by looking at a sparse model which is close to ours. Specifically, we can look at a linear system with Gaussian noise and deterministic outliers. Hence, setting where is a random vector with independent variables, and is a deterministic sparse vector into (1 results in the following model
CS theory hints that we can use a random matrix to encode the sparsity of . Therefore, to address both criteria we propose the following simple structure:
where is a matrix with independent and identically distributed (i.i.d.) elements and
is a projection matrix onto the null space of . This choice guarantees that we will always be better or equal to the naive Gaussian matched filter which is exactly the first rows. The rest of the rows randomly span as much as possible from the remaining space.
5 Compressed Huber
The second part of our framework is the CH algorithm which estimates given for a fixed . Ideally, we would like to find the parameter that maximizes the likelihood of . But this vector is a high dimensional mixture of many non-Gaussian random variables, and its distribution is hard to analyze. Instead, we propose to estimate both and simultaneously. Statistically speaking, we jointly seek for via a maximum likelihood approach and for via a maximum a posteriori approach (see  for more details on JMAP-ML estimation):
where is the negative-log-posterior distribution of as described in (4). Because the distributions of and as a result the distribution of are generally unknown, we can not calculate directly. Hence, we have to use some robust objective function which will be indifferent to the specific distribution of the outliers. Such is the Huber’s loss function which was proven to be optimal in the uncompressed case (in the minimax sense) . Together, our reconstruction algorithm is the solution to
and is calculated from
with and . The above is a convex minimization that can be efficiently solved using off-the-shelf optimization packages, e.g., CVX .
Remarkably,  showed that Huber’s function can be expressed as:
Plugging this expression into (17) yields
Then by solving explicitly for we can derive the following equivalent problem
This formulation provides an interesting observation. The solution of (22) can be interpreted as an estimator to the deterministic sparse outlier model in (13). It can be seen that the first term in (22) is a standard WLS objective whereas the second term penalizes vectors which are not sparse.
The above formulation is also useful from a numerical perspective. Note that is compressed and low dimensional, but the internal variable is of length and therefore the optimization requires a large scale numerical algorithm. For this purpose, we utilize the well known FISTA solver due to . First, we notice that can be solved explicitly
where . Then by substituting it to (22) we get a classical LASSO problem
which can be solved efficiently by FISTA. For convenience of notation, we also define the shrinkage and thresholding operator
Summing the above, a pseudocode for solving CH in (22) using FISTA is provided in Algorithm 1.
After computing CH, we propose to fine tune the estimate. By examining the optimal we (approximately) detect the locations of the outliers
Then we estimate their variance
recover the diagonal covariance matrix of denoted by and finally compute in (11) replacing the true with its estimate . We denote this second phase as AWLS for approximate WLS.
6 Numerical results
To demonstrate the advantages of CMF and CH we present simulation results in a simple signal processing application. We consider the estimation of amplitudes and phases of sinusoids with known frequencies contaminated by several non-Gaussian noises. Specifically, we define , , , and . We express the sinusoids in linear form by defining where , , and . The frequencies are , , , , and . The true amplitudes are all unit and the data is contaminated with a Gaussian mixture noise i.e. . The data is compressed using CMF and the system parameters are estimated using the suggested algorithms.
Fig. 2 presents the estimation mean squared errors averaged over the realizations of the noise as a function of compression ratio . For comparison, the errors are bounded below by computing (9) (NO COMPRESSION) and above by computing (12) with (FULL COMPRESSION). In between is the ORACLE using the proposed CMF with randomized versions of . Our proposed estimators are denoted by CH and AWLS. It is easy to see the advantages of CH which closes the performance gap with only a quarter of the complete measurements (i.e. four fold compression). AWLS is even better and achieves the same performance with a higher compression. On the downside, the simulation suggest that there may still be room for improvement. Neither CH nor AWLS succeed in achieving the ORACLE performance that knows the locations of the outliers. Similar results were obtained for Laplace distributed outliers with the same variance.
Fig. 3 presents the estimation (using CH) mean squared errors averaged over the realizations of the noise as a function of for several compression ratios. As can be seen the estimation error is monotonic and asymptotically tends to zero inversely proportional to . Thus, suggesting asymptotic consistency of the estimator. It is also worth noting that asymptotically the ratio between estimation MSE’s for different compression is constant.
7 Conclusions and future work
In this paper we have presented a simple compression scheme for a linear system without major information loss. We have also developed a fast recovery scheme for the compressed data. Combination of the two methods were shown, by simulations, to recover the system parameters using approximately four fold compression with no significant loss in MSE. Additional research is needed to optimize the compression matrix and finding more efficient recovery algorithms or providing a tighter lower bound for them.
-  R. G. Baraniuk. Compressive sensing [lecture notes]. Signal Processing Magazine, IEEE, 24(4):118–121, 2007.
-  A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202, 2009.
-  K. L. Blackard, T. S. Rappaport, and C. W. Bostian. Measurements and models of radio frequency impulsive noise for indoor wireless communications. IEEE Journal on Selected Areas in Communications, 11(7):991–1001, 1993.
-  R. S. Blum, R. J. Kozick, and B. M. Sadler. An adaptive spatial diversity receiver for non-Gaussian interference and noise. IEEE Transactions on Signal Processing, 47(8):2100–2111, 1999.
-  P. L. Brockett, M. Hinich, and G. R. Wilson. Nonlinear and non-Gaussian ocean noise. The Journal of the Acoustical Society of America, 82:1386, 1987.
-  R. A. Fisher. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character, 222:309–368, 1922.
-  J. J. Fuchs. An inverse problem approach to robust regression. In Proceedings., 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1999., volume 4, pages 1809–1812. IEEE, 1999.
-  G.B. Giannakis, G. Mateos, S. Farahmand, V. Kekatos, and H. Zhu. USPACOR: Universal sparsity-controlling outlier rejection. pages 1952–1955, 2011.
-  M. Grant, S. Boyd, and Y. Ye. CVX: Matlab software for disciplined convex programming, 2008.
-  P. J. Huber. Robust estimation of a location parameter. The Annals of Mathematical Statistics, 35(1):73–101, 1964.
-  P. J. Huber. Robust statistics. Springer, 2011.
-  S. Kay. Fundamentals of statistical processing, vol. i: Estimation theory. America: Prentice Hall PTR, 1993.
-  J.N. Laska, M.A. Davenport, and R.G. Baraniuk. Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. pages 1556–1560, 2009.
-  J. Lindenlaub and K. Chen. Performance of matched filter receivers in non-Gaussian noise environments. Communication Technology, IEEE Transactions on, 13(4):545–547, 1965.
-  D. Middleton. Man-made noise in urban environments and transportation systems: Models and measurements. Communications, IEEE Transactions on, 21(11):1232–1241, 1973.
-  T. Routtenberg, Y. C. Eldar, and L. Tong. Maximum likelihood estimation under partial sparsity constraints. In Proceedings., 2013 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2013. IEEE, 2013.
-  C. Studer, P. Kuppinger, G. Pope, and H. Bolcskei. Sparse signal recovery from sparsely corrupted measurements. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pages 1422–1426, 2011.
-  R. Tibshirani. Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.
-  X. Wang and H. V. Poor. Robust multiuser detection in non-Gaussian channels. IEEE Transactions on Signal Processing, 47(2):289–305, 1999.
-  A. Yeredor. The joint MAP-ML criterion and its relation to ML and to extended least-squares. IEEE Transactions on Signal Processing, 48(12):3484–3492, 2000.