Multichannel sparse recovery of complexvalued signals using Huber’s criterion
Abstract
In this paper, we generalize Huber’s criterion to multichannel sparse recovery problem of complexvalued measurements where the objective is to find good recovery of jointly sparse unknown signal vectors from the given multiple measurement vectors which are different linear combinations of the same known elementary vectors. This requires careful characterization of robust complexvalued loss functions as well as Huber’s criterion function for the multivariate sparse regression problem. We devise a greedy algorithm based on simultaneous normalized iterative hard thresholding (SNIHT) algorithm. Unlike the conventional SNIHT method, our algorithm, referred to as HUBSNIHT, is robust under heavytailed nonGaussian noise conditions, yet has a negligible performance loss compared to SNIHT under Gaussian noise. Usefulness of the method is illustrated in source localization application with sensor arrays.
I Introduction
In the multiple measurement vector (MMV) model, a single measurement matrix is utilized to obtain multiple measurement vectors, i.e., where is an measurement matrix and are the (unobserved) random noise vectors. Typically there are more column vectors than row vectors , i.e., . The unknown signal vectors , are assumed to be sparse, i.e., most of the elements are zero. In matrix form, the MMV model is
(1) 
where , and collect the measurement, the signal and the error vectors, respectively. When , the model reduces to standard compressed sensing (CS) model [1]. The key assumption of MMV model is that the signal matrix is rowsparse, i.e., at most rows of contain nonzero entries. The rowsupport of is the index set of rows containing nonzero elements, When is rowsparse, i.e., , joint estimation can lead both to computational advantages and increased reconstruction accuracy; See [2, 3, 4, 5, 1, 6].
The objective of multichannel sparse recovery problem is on finding a row sparse approximation of the signal matrix based on knowledge of , the measurement matrix and the sparsity level . Such a problems arises in electroencephalography and magnetoencephalography (EEG/MEG) [1] blind source separation [7], and directionofarrival (DOA) estimation of sources in array and radar processing [8, 9, 10]. Many greedy pursuit CS reconstruction algorithms have been extended for solving MMV problems. These methods, such as simultaneous normalized iterative hard thresholding (SNIHT) algorithm [6] are guaranteed to perform very well provided that suitable conditions (e.g., incoherence of and non impulsive noise conditions) are met. The derived (worst case) recovery bounds depend linearly on , so the methods are not guaranteed to provide accurate reconstruction/approximation under heavytailed nonGaussian noise.
In this paper, we generalize Huber’s criterion [11, cf. Section 7.7, 7.8] (often referred to as ”Huber’s approach 2”) originally developed for overdetermined linear regression (, ) model to the complexvalued case and for the more general multivariate sparse regression problem. This requires generalizing robust estimates of regression (and loss functions) for complexvalued case. In Huber’s devise, one estimates the signal matrix and scale of the error terms simultaneously. This is necessary since most robust lossfunctions require an estimate of the scale. Using Huber’s criterion in the MMV model one may elegantly estimate both the sparse signal matrix and the scale of the errors simultaneously. In particularly, we are able to circumvent the problem of obtaining a preliminary robust scale estimate which is a challenging problem in illposed multivariate sparse regression model since the support of and hence the contributing elementary vectors of on measurements are not known. In earlier related work Huber’s approach 2 has been considered for Lassotype realvalued linear regression setting in [12, 13] and realvalued compressed sensing in [14]. For our multichannel sparse recovery problem, we devise SNIHT algorithm which results in a simple, computationally efficient and scalable approach for solving the MMV sparse reconstruction problem.
Let us offer a brief outline of the paper. In Section II, we give necessary notations and definitions as well as provide motivation and background of robust sparse recovery problem. Robust complexvalued loss functions and their properties are outlined in Section III and a generalization of Huber’s loss function for complex measurements is given. Then, in Section IV we formulate Huber’s criterion for MMV model and the related SNIHT algorithm, called HUBSNIHT, is derived in Section V. Finally, we illustrate the usefullness of the method in source localization application in Section VI.
Ii Background
Iia Notations
For a matrix and an index set of cardinality , we denote by (resp. ) the (resp. ) matrix restricted to the columns (resp. rows) of indexed by the set . The th column vector of is denoted by and the hermitian transpose of the th row vector of by , . Furthermore, if , then refers to elementwise application of the function to its matrix valued argument, so with .
The usual Euclidean norm on vectors will be written as . The matrix space is equipped with the usual Hermitian inner product
where the trace of a (square) matrix is the sum of diagonal entries. We define the weighted inner product as
where is real matrix of positive weights. Note that reduces to conventional inner product when is a matrix of ones. The Frobenius norm is given by the inner product as and denotes the weighted Frobenius norm. The row quasinorm of is the number of nonzero rows, i.e., . Hence the assumption that the signal matrix is rowsparse in the MMV model is equivalent with the statement that .
We use to denote the hard thresholding operator: for a matrix , retains the elements of the rows of that possess largest norms and set elements of the other rows to zero. Notation refers to sparsified version of such that the entries in the rows indexed by set remain unchanged while all other rows have all entries set to .
IiB Robust constrained optimization problem
Suppose that the error terms are i.i.d. continuous random variables from a circular distribution [15] with p.d.f. , where denotes the standard form of the density and is the scale parameter. If the scale is known, then a reasonable approach for solving the simultaneous sparse recovery problem is to minimize a distance criterion of residuals,
(2) 
for some suitable loss function subject to rowsparsity constraint . For conventional least squares (LS) loss function , the scale can be factored out from the objective function, and the minimization problem reduces to
The wellknown problem with LS minimization is that it gives a very small weight on small residuals and a strong weight on large residuals, implying that even a single large outlier can have a large influence on the obtained result.
At least two problems arises when using conventional robust loss functions in (2). First, commonly used robust loss functions in robust statistics such as Huber’s or Tukey’s loss functions require an estimate of scale . Obtaining a reliable robust estimate of scale is a difficult problem. It involves obtaining a rowsparse robust preliminary estimate of the signal matrix and then computing robust scale estimate based on the resulting residual matrix . Second problem is that robust loss functions are defined in the realvalued case and some thought must be given on special properties of complexvalued loss functions. These problems are addressed next in Section III and Section IV.
Iii Loss functions: complex valued case
We start by giving a proper definition of a loss function .
Definition 1
Function is called a loss function if it verifies:

is circularly symmetric, , .

. Furthermore, is differentiable function and increasing in .
Let us first note that condition (L1) is equivalent with the statement
(3) 
for some . The fact that (3) (L1) is obvious and the converse can be derived by invariance arguments. This illustrates that is not differentiable (i.e., holomorphic or analytic function). This is of course natural since only functions that are both holomorphic and realvalued are constants. The complex derivative of w.r.t. is
which will be referred in the sequel as the score function. Since , we can write using basic rules of complex differentiation [16] in the form
where
is the complex signum function and denotes the real derivative of the realvalued function . In order to make minimization of (2) possible by simple gradient descent type algorithms, we narrow down the set of loss functions by imposing the assumption:

is a convex function
For example, the conventional LS loss function verifies assumptions (L1)(L3). In this case, and the score function is . In this paper, we assume that the loss function verifies (L1)(L3).
We define Huber’s loss function in the complex case as
(4) 
where is a userdefined threshold that influences the degree of robustness and efficiency of the method. Huber’s function is a hybrid of and loss functions, using loss for relatively small errors and loss for relatively large errors. It verifies conditions (L1)(L3). Huber’s score ()function is
Note that Huber’s is a winsorizing (clipping) funtion: the smaller the , the more clipping is actioned on the residuals.
Iv Huber’s criterion for multichannel sparse recovery
As discussed earlier, the scale of the error terms is unknown and needs to be estimated jointly with the signal matrix. We discuss here how this can be done elegantly using Huber’s approach 2. First note that Maximum likelihood (ML)approach for solving the unknown and leads to minimizing the negative loglikelihood function of the form
where depends on the underlying standard form of the density of the error terms. Then, one could replace the ML loss function with a robust loss function which need not be related to any circular density , e.g., the Huber’s loss function. The negative loglikelihood function is however not convex in . This follows since is not convex in (for fixed ) and hence cannot be jointly convex.
Huber [11] proposed an elegant devise to circumvent the above problem. See also [12] for further study of Huber’s approach. We generalize the Huber’s approach 2 for the complex multivariate regression case and minimize
(5) 
where Ê is a fixed scaling factor. Important feature of the objective function is that it is jointly convex in given that is convex. In addition the minimizer preserves the same theoretical robustness properties (such as bounded influence function) as the minimizer in the model where is assumed to be known (fixed). This is not the case for the MLobjective function .
The stationary point of (5) can be found by setting the complex matrix derivative of w.r.t. and the real derivative of w.r.t. to zero. Simple calculations then show that the minimizer is a solution to a pair of estimating equations:
(6)  
(7) 
where and is defined as
(8) 
Recall that notation refers to elementwise application of function to its matrix valued argument, so . Thus if is convex and the MMV model is overdetermined with nonsparse , solving the above estimating equations would give the global minimum of (5).
The scaling factor in (5) is chosen so that the obtained scale estimate is Fisherconsistent for the unknown scale when , which due to (7) is chosen so that
For many loss functions, can be computed in closedform. For example, for Huber’s function (4) the function in (8) becomes
and the concistency factor can be easily solved in closedform by elementary calculus as
(9) 
Note that depends on the threshold . We will choose threshold as for . The rationale behind this choice is that under Gaussian errors, . Hence a sensible choice is to determine so that is the th upper quantile of the distribution. The choice , implies and hence notrimming of the residuals. In our simulations we use which yields . The smaller the (and hence ) the more trimming is actioned on residuals.
V SNIHT algorithm for Huber’s criterion
Our aim is at solving
This problem is combinatorial (i.e., NPhard) but greedy pursuit approaches can be devised. Thus due to biconvexity of the objective function, we can use Huber’s loss function and greedy pursuit NIHT algorithm can be devised to compute an approximate solution. Recall that NIHT is a projected gradient descent method that is known to offer efficient and scalable solution for sparse approximation problem [17]. NIHT updates the estimate of by taking steps towards the direction of the negative gradient followed by projection onto the constrained space.
In Huber’s criterion, if we consider fixed at a value (the value of at th iteration), the simultaneous NIHT (SNIHT) update of the signal matrix becomes
where is the update of the stepsize at th iteration and
will be referred to as pseudoresidual. Note that . The scale is updated (consider signal matrix fixed at a value ) using (7) by a fixedpoint iteration
where
The pseudocode for the SNIHT algorithm in the case that the loss function is Huber’s function (4) is given in Algorithm 1. We refer to this algorithm as HUBSNIHT in the sequel. The steps 39 can be divided to 3 stages described below: scale stage (Steps 3, 4) build up the scale update , signal stage (Steps 5, 7, 8, 9) build up the sparse signal update and the support , and stepsize stage (Step 7) computes the optimal stepsize update for the gradient descent move. The computation of the stepsize will be described in the next two paragraphs. Note that it is possible to tune the algorithm for different applications by simply altering the criterion for halting the algorithm. Matlab function is available at http://users.spa.aalto.fi/esollila/software.html.
As was noted in [17], stepsize selection is very important for convergence and needs to be adaptively controlled at each iteration. Given the found support is correct, we choose as the minimizer of the convex objective function (2) for fixed scale at in the gradient ascent direction , i.e.
(10) 
where and . This reduces to minimizing a simple linear regression ()estimation problem where the response is and the predictor is . It is easy to show (details omitted) that the minimizer of is the unique solution to a fixed point (FP) equation , where
(11) 
where the right hand side depends on the unknown via the weight matrix , defined as
where is a weight function based on Huber’s loss function, defined as
If the loss function is LSloss (equivalent to Huber’s function when ), then the minimizer of (V) is easily found in closed form since in this case is equal to a matrix of ones. Hence the FP equation is explicit and the solution is . This is indeed the same stepsize used in conventional SNIHT [6].
For Huber’s loss function, the minimizer of (V) can be found by running the FP iterations until convergence (with initial value ). Instead, we use approximate of the solution given by 1step FP iterate with initial value given by the previous stepsize . In other words, in Step 7, the update is computed as
Vi Application to Source Localization
We consider sensor array consisting of sensors that receives narrowband incoherent farfield planewave sources from a point source (). At discrete time , the array output (snapshot) is a weighted linear combination of the signal waveforms corrupted by additive noise , , where is the steering matrix parametrized by the vector of (distinct) unknown directionofarrivals (DOA’s) of the sources. Each column vector , called the steering vector, represents a point in known array manifold . The objective of sensor array source localization is to find the DOA’s of the sources, i.e., to identify the steering matrix parametrized by . We assume that the number of sources is known.
As in [8], we cast the source localization problem as a multichannel sparse recovery problem. We construct an overcomplete steering matrix , where represents a sampling grid of all source locations of interest. If contains the true DOA’s , , then the measurement matrix consisting of snapshots at time instants can be exactly modelled as MMV model (1), where the signal matrix is rowsparse matrix with source signal sequences as its nonzero row vectors. Thus identifying the source locations is equivalent to identifying the support since any maps to a DOA in the grid. Since the steering matrix is completely known, we can use HUBSNIHT method to identify the support.
We assume that independent (spatially and temporally) complex circular Gaussian source signals of equal power arrive on an uniform linear array (ULA) of sensors with half a wavelength interelement spacing from DOA’s and . In this case, the array manifold is . The noise matrix has i.i.d. elements following inverse Gaussian compound Gaussian (IGCG) distribution [18] with shape parameter and unit variance. CGIG distribution is heavytailed and has been shown to accurately model radar clutter in [18]. Note that the covariance matrix of the snapshot is , so we may use the popular MUSIC method to localize the sources. In other words, we search for peaks of the MUSIC pseudospectrum in the grid. We use a uniform grid on with 2 degree spacing, thus containing the true DOA’s. For the source localization application, we make the following modifcation to the algorithm: In Step 1 of HUBSNIHT algorithm, we locate the largest peaks of rownorms of instead of taking as indices of largest rownorms of .
We then use SNIHT, HUBSNIHT and MUSIC to identify the support (which gives the DOA estimates) and compute the empirical probability of exact recovery (PER) rates and the relative frequency of DOA estimates in the grid based on 1000 MC runs. Full PER rate implies that the support (and hence DOA’s) were correctly identified in all MC trials. Such a case is shown in upper plot of Figure 1 for HUBSNIHT when the number of snapshots is and the SNR is dB. The PER rate of HUBSNIHT was , but PER rates of SNIHT and MUSIC were considerably lower, and , respectively. In the second setting, we lower the SNR to dB. In this case, the conventional SNIHT and MUSIC methods fail completely and provide nearly a uniform frequency on the grid. This is illustrated in the middle plot of Figure 1. Note that the robust HUBSNIHT provides high peaks on the correct DOA’s. The PER rates of SNIHT, HUBSNIHT and MUSIC were , and , respectively. Thus only HUBSNIHT is able to offer good localization of the sources whereas the nonrobust methods do not provide much better performance than a random guess. In the 3rd setting, we alter the setup of 1st setting by decreasing the number of snapshots from as low as . The performance differences between the methods are now more significant as is illustrated in the lower plot of Figure 1. In this case the PER rates of SNIHT, HUBSNIHT and MUSIC were , and , respectively. Again, the HUBSNIHT performed the best.
References
 [1] M. F. Duarte and Y. C. Eldar, “Structured compressed sensing: From theory to applications,” IEEE Trans. Signal Process., vol. 59, no. 9, pp. 4053–4085, 2011.
 [2] J. A. Tropp, A. C. Gilbert, and M. J. Strauss, “Algorithms for simultaneous sparse approximation. part I: Greedy pursuit,” Signal Processing, vol. 86, pp. 572–588, 2006.
 [3] J. A. Tropp, “Algorithms for simultaneous sparse approximation. part II: Convex relaxation,” Signal Processing, vol. 86, pp. 589–602, 2006.
 [4] J. Chen and X. Huo, “Theoretical results on sparse representations of multiplemeasurement vectors,” IEEE Trans. Signal Process., vol. 54, no. 12, pp. 4634–4643, 2006.
 [5] Y. C. Eldar and H. Rauhut, “Average case analysis of multichannel sparse recovery using convex relaxation,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 505–519, 2010.
 [6] J. D. Blanchard, M. Cermak, D. Hanle, and Y. Jin, “Greedy algorithms for joint sparse recovery,” IEEE Trans. Signal Process., vol. 62, no. 7, pp. 1694 – 1704, 2014.
 [7] R. Gribonval and M. Zibulevsky, Handbook of Blind Source Separation. Oxford, UK: Academic Press, 2010, ch. Sparse component analysis, pp. 367–420.
 [8] D. Malioutov, M. Çetin, and A. S. Willsky, “A sparse signal reconstruction perspective for source localization with sensor arrays,” IEEE Trans. Signal Process., vol. 53, no. 8, pp. 3010–3022, 2005.
 [9] Y. Wang, G. Leus, and A. Pandharipande, “Direction estimation using compressive sampling array processing,” in IEEE Workshop on Statistical Signal Processing (SSP’09), 2009, pp. 626–629.
 [10] S. Fortunati, R. Grasso, F. Gini, M. S. Greco, and K. LePage, “Singlesnapshot DOA estimation by using compressed sensing,” EURASIP J. Adv. Signal Process., vol. 2014, no. 1, pp. 1–17, 2014.
 [11] P. J. Huber, Robust Statistics. New York: Wiley, 1981.
 [12] A. B. Owen, “A robust hybrid of lasso and ridge regression,” Contemporary Mathematics, vol. 443, pp. 59–72, 2007.
 [13] S. LambertLacroix, L. Zwald et al., “Robust regression through the huberÕs criterion and adaptive lasso penalty,” Electronic Journal of Statistics, vol. 5, pp. 1015–1053, 2011.
 [14] E. Ollila, H.J. Kim, and V. Koivunen, “Robust iterative hard thresholding for compressed sensing,” in Proc. IEEE Int’l Symp. Communications, Control, and Signal Processing (ISCCSP’14), Athens, Greece, May 21 – 23, 2014, pp. 226–229.
 [15] E. Ollila, J. Eriksson, and V. Koivunen, “Complex elliptically symmetric random variables – generation, characterization, and circularity tests,” IEEE Trans. Signal Process., vol. 59, no. 1, pp. 58–69, 2011.
 [16] J. Eriksson, E. Ollila, and V. Koivunen, “Essential statistics and tools for complex random variables,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5400–5408, 2010.
 [17] T. Blumensath and M. E. Davies, “Normalized iterative hard thresholding: guaranteed stability and performance,” IEEE J. Sel. Topics Signal Process., vol. 4, no. 2, pp. 298–309, 2010.
 [18] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor, “Compoundgaussian clutter modelling with an inverse gaussian texture distribution,” IEEE Signal Process. Lett., vol. 19, no. 12, pp. 876–879, 2012.