Sparse Bayesian Learning for DOA Estimation in Heteroscedastic Noise
Abstract
July 3, 2019
The paper considers direction of arrival (DOA) estimation from longterm observations in a noisy environment.
In such an environment the noise source might evolve, causing the stationary models to fail.
Therefore a heteroscedastic Gaussian noise model is introduced where the variance can vary across observations and sensors.
The source amplitudes are assumed independent zeromean complex Gaussian distributed with unknown variances (i.e. the source powers), inspiring stochastic maximum likelihood DOA estimation.
The DOAs of plane waves are estimated from multisnapshot sensor array data using sparse Bayesian learning (SBL) where the noise is estimated across both sensors and snapshots.
This SBL approach is more flexible and performs better than highresolution methods since they cannot estimate the heteroscedastic noise process.
An alternative to SBL is simple data normalization, whereby only the phase across the array is utilized.
Simulations demonstrate that taking the heteroscedastic noise into account improves DOA estimation.
I Introduction
With long observation times weak signals can be extracted in a noisy environment. Most analytic treatments analyze these cases assuming Gaussian noise with constant variance. For long observation times the noise process though is likely to change with time causing the noise variance to evolve. This is called a heteroscedastic Gaussian process, meaning that the noise variance is evolving. While the noise variance is a nuisance parameter that we are not interested in, it still needs to be estimated or included in the processing in order to obtain an accurate estimate of the weaker signals.
Accounting for the noise variation is certainly important for machine learning [1, 2] and related to robust statistics [3, 4]. Heteroscedastic noise models have been used in e.g. finance [5] and image processing [6]. In statistical signal processing, the noise has been assumed to vary spatially [7, 8, 9], but spatiotemporally varying noise as considered here has not been studied. The proposed processing could be applied to spatial coherence loss [10, 11, 12] or to wavefront decorrelation, where turbulence causes the wave front to be incoherent for certain observations (thus more noisy). This has lead to socalled lucky imaging in astronomy [13] or lucky ranging in ocean acoustics [14], where only the measurements giving good results are used. As a result an involved hypothesis testing is needed to determine the measurements to be used. In contrast, we propose to use all measurements.
In applications, a simple way to account for noise power variations is to normalize the data magnitude to only contain phase information as demonstrated for beamforming in seismology [15, 16], noise cross correlation in seismology [17, 18, 19, 20, 21] and acoustics [22], source deconvolution in ocean acoustics [23, 24] and speaker localization [25, 26]. Highresolution beamformers such as MUSIC [27] relying on a sample covariance matrix are not likely to perform well for heteroscedastic noise as the loud noise might dominate the sample covariance matrix. More advanced methods than an eigenvalue decomposition are needed to separate the signal and noise subspaces. We demonstrate that for wellseparated sources normalizing the data to only contain phase information works well.
When the sources are closely spaced, more advanced parametric methods are needed for DOA estimation when the noise power is varying in space and time and the sources are weak. We derive and demonstrate this for the application of multiple measurement vector (MMV, or multisnapshot) compressive beamforming [28, 29, 30, 31]. We solve the MMV problem using the sparse Bayesian learning (SBL) framework [30, 32, 33] and use the maximumaposteriori (MAP) estimate for DOA reconstruction. We assume the source signals to jointly follow a zeromean multivariate complex normal distribution with unknown power levels. The noise across sensors and snapshots also follows a zeromean multivariate normal distribution with unknown variances. These assumptions lead to a Gaussian likelihood function.
The corresponding posterior distribution is also Gaussian and already developed SBL approaches solve this well. We base our development on our fast SBL method [32, 33] which we augment to estimate noise variances, potentially as many variances as observations. Standard techniques are based on minimizationmajorization [34] and expectation maximization (EM) [30, 35, 36, 37, 38, 39, 40], though not all estimates work well. Instead, we estimate the unknown variances using stochastic maximum likelihood [41, 42, 43], modified to obtain noise estimates even for a single observation.
Ia Heteroscedastic noise observation model
For the th observation snapshot, we assume the linear model
(1) 
where the dictionary is constant and known, and the source vector contains the physical information of interest. Further, is additive zeromean circularly symmetric complex Gaussian noise, which is generated from a heteroscedastic Gaussian process . Due to the circular symmetry of the noise the phase is uniformly distributed.
We specialize to diagonal covariance matrices, parameterized as
(2) 
where with the th standard basis vector. Note that the covariance matrices are varying over the snapshot index and we introduce the matrix of all noise standard deviations as
(3) 
where denotes nonnegative real numbers.
We consider three special cases for the a priori knowledge on the noise covariance model (2) which are expressed as constraints on .
 Case I

We assume widesense stationarity of the noise in space and time: . The model is homoscedastic,
(4)  Case II

We assume widesense stationarity of the noise in space only, i.e., the noise variance for all sensor elements is equal across the array, and it varies over snapshots. The noise variance is heteroscedastic in time (across snapshots),
(5)  Case III

No additional constraints other than (3). The noise variance is heteroscedastic across both time and space (sensors and snapshots.). In this case .
The relation between these noise cases is . From the sets ( is I, II, or III) both in (3) and in (2) can be constructed.
IB Array model
Let be the complex source amplitudes, with and , at DOAs (e.g., ) and snapshots for a frequency . We observe narrowband waves on sensors for snapshots . A linear regression model relates the array data to the source amplitudes as
(6) 
The dictionary contains the array steering vectors for all hypothetical DOAs as columns, with the th element given by ( is the distance to the reference element and the sound speed).
We assume and thus (6) is underdetermined. In the presence of only few stationary sources, the source vector is sparse with . We define the th active set
(7) 
and assume is constant across all snapshots . Also, we define which contains only the “active” columns of . In the following, denotes the vector norm and the matrix Frobenius norm.
Similar to other DOA estimators, can be estimated by model order selection criteria or by examining the angular spectrum. The parameter is required only for the noise power in the SBL algorithm. An inaccurate estimate influences the algorithm’s convergence.
IC Prior on the sources
We assume that the complex source amplitudes are independent both across snapshots and across DOAs and follow a zeromean circularly symmetric complex Gaussian distribution with DOAdependent variance , ,
(8)  
(9) 
i.e., the source vector at each snapshot is multivariate Gaussian with potentially singular covariance matrix,
(10) 
as (typically ). Note that the diagonal elements of , denoted as , represent source powers and thus . When the variance , then with probability 1. The sparsity of the model is thus controlled with the parameter , and the active set is equivalently
(11) 
The SBL algorithm ultimately estimates rather than the complex source amplitudes . This amounts to a significant reduction of the degrees of freedom resulting in a low variance of the DOA estimates.
ID Stochastic likelihood
We here derive the wellknown stochastic maximum likelihood function [44, 45, 46]. Given the linear model (6) with Gaussian source (9) and noise (2) the array data is Gaussian with for each snapshot the covariance given by
(12) 
The probability density function of is thus given by
(13) 
The snapshot loglikelihood for estimating and is
(14) 
This likelihood function is identical to the Type II likelihood function (evidence) in standard SBL [36, 35, 32] which is obtained by treating as a hyperparameter. The Type II likelihood is obtained by integrating the likelihood function over the complex source amplitudes, cf. (29) in [32]. The stochastic maximum likelihood approach is used here as it is more direct.
The parameter estimates and are obtained by maximizing the likelihood, leading to
(15) 
where is the feasible set of noise variances in (3) corresponding to the noise cases ( I, II, or III, see Sec. IA).The likelihood function (14) is similar to the ones derived for SBL and LIKES [34]. If and are known, then the MAP estimate is the posterior mean and covariance [47, 32],
(16)  
(17) 
The diagonal elements of , i.e., , control the rowsparsity of as for the corresponding th row of becomes .
IE LASSSO versus SBL
Both LASSO and SBL use the linear model (6) with complex zeromean Gaussian random noise but they differ in the modeling of the source matrix .
The LASSO approach assumes a priori random with uniformly i.i.d. distributed phase and Laplacelike prior amplitudes,
(18)  
(19) 
Thus only the summed amplitudes (19) are Laplacian. The elements in are unknown and must be estimated. The LASSO approach uses the conditional likelihood (Type I) for ) and applies Bayes rule with the prior giving the MAP estimate
(20)  
The LASSO approach (20) estimates the realization of for each snapshot . The number of parameters to be estimated for the LASSO approach grows linearly with the number of snapshots.
The SBL approach on the other hand, assumes that every column of is random with the same complex zeromean Gaussian a priori distribution . The SBL approach uses the likelihood in (15) and estimates but not the realization of . Therefore, the number of parameters to be estimated for the SBL approach is independent of the number of snapshots. Thus the quality of the estimate improves faster than with LASSO.
For the heteroscedastic noise model, a different covariance matrix could be included in the data fit of (20). However, since the number of parameters to be estimated grows with the number of snapshots, this seems less attractive than using SBL.
IF Prewhitening
The purpose of this section is to motivate the empirical evidence [15]–[26] that phaseonly processing might provide improved estimates over using also amplitudes. This is most likely to happen when sources are not closely spaced and at low SNR as demonstrated in the examples, Sec. V.
Let us introduce the factorization , where is a square and nonsingular matrix. For our particular setup, we have . The matrix is useful for prewhitening the sensor data. The corresponding whitened sensor data and dictionary matrix are
(21a)  
(21b) 
For known diagonal noise covariance the above means we have to normalize each row in (1) with as then the noise satisfies , and thus all entries are identically distributed.
If the noise covariance matrices are not known, they can be estimated as we will show later on. More specifically, at low SNR, the Noise Cases I, II, and III, lead to the noise variance estimates (43), (44). The corresponding prewhitening matrices can then be computed as
(22) 
leading to
(23) 
Empirically, it has been found that applying prewhitening to the data only (the dictionary is nonwhitened) effective in finding the strongest DOA.
For Noise Case I, prewhitening does not play a role and the conventional beamformer is formulated using the power spectrum at DOA
(24) 
where the sample covariance matrix is
(25) 
For Noise Cases II and III on the other hand, we will work with the prewhitened data. In those cases, , the conventional beamformer leads to the powerspectra and , respectively, which can be formulated as
(26) 
where the whitened sample covariance matrix is
(27) 
The weighting is given by (22). For Case III, only phase is used as can be observed from (23), which results in phaseonly processing. As demonstrated in Section VB this simple prewhitening can improve the DOA performance at low SNR.
Ii Source power estimation
Let us now focus on the SBL algorithm solving (15). The algorithm iterates between the source power estimates derived in this section and the noise variance estimates computed in Sec. III. After detailing these estimation procedures, the full algorithm is summarized in Sec. IV.
We impose the diagonal structure , in agreement with (9), and form derivatives of (14) with respect to the diagonal elements , cf. [44]. Using
(28)  
(29) 
the derivative of (14) is formulated as
(30)  
(31) 
For the solution to (15), we impose the necessary condition . To obtain an iterative equation in , we multiply the first term in the above equation with the factor , whereby
(32) 
Assuming and given (from previous iterations or initialization) and forcing (32) to zero, we obtain the following fixed point iteration [48] for the :
(33) 
We use , but have not carefully tested for optimal values and this value depends on many factors such as which noise estimate is used and closeness of the DOAs. A value of gives the update equation used in [47, 30, 33] and gives the update equation used in [32].
Iii Noise variance estimation
While there has been more focus on estimating the DOAs or , noise is an important part of the physical system and a correct estimate is needed for good convergence properties. In SBL, the noise variance controls the sharpness of the peaks in the spectrum, with higher noise levels giving broader peaks. Thus, as we optimize the DOAs we expect the noise levels to decrease and the spectrum to become sharper.
In this section we estimate the noise variance for the three noise cases in Sec. IA. In Secs. IIIA–IIIC, we will assume the support of is known.
Iiia Noise estimate, Case I
Under Noise Case I, where with the identity matrix of size , stochastic maximum likelihood [38, 41, 43] can provide an asymptotically efficient estimate of if the set of active DOAs is known.
Let be the covariance matrix of the active sources obtained above with corresponding active steering matrix which maximizes (14). The corresponding data covariance matrix is
(34) 
Note that for Noise Case I, the data covariance matrices (12) and (34) are identical. Following [42], we continue from (30),
(35)  
(36) 
for all active sources (). Since , Eq.(36) simplifies to
(37a) 
To obtain Jaffer’s condition below, we impose the following additional constrains
(37b) 
Together, the conditions (37a), (37b) give Jaffer’s condition ([42]:Eq.(6)), i.e.,
(38) 
which we enforce at the optimal solution . Jaffer’s condition follows from allowing arbitrary correlations among the source signals, i.e. when the matrix is not restricted to be diagonal. Substituting (34) into (38) gives
(39) 
Let us then define the projection matrix onto the subspace spanned by the active steering vectors
(40) 
Leftmultiplying (39) with and rightmultiplying it with , we obtain
(41) 
Evaluating the trace, using and , gives
(42)  
(43) 
where we have defined .
The above approximation motivates the noise power estimate for Noise Case I (43), which is errorfree if , unbiased because , consistent since also its variance tends to zero for [49], and asymptotically efficient as it approaches the CRLB for [50]. Note that, the Noise Case I estimate (43) is valid for any number of snapshots, even for just one snapshot.
IiiB Noise estimate, Case II
For Noise Case II, where , we apply (43) for each snapshot individually, leading to
(44) 
Several alternative estimators for the noise variance are proposed based on EM [36, 30, 37, 47, 51]. For a comparative illustration in Sec. VD1 we will use the iterative noise estimate from [36], given by
(45) 
where the posterior mean and covariance are given in (16) and (17). Empirically, EM noise estimates such as (45) significantly underestimate the true noise variance in our applications.
IiiC Noise estimate, Case III
Let us start from the definition of the noise covariance
(46) 
This motivates pluggingin the singleobservation signal estimate for the active (nonzero) entries in . This estimate is based on the single observation and the projection matrix (40), giving the rank1 estimate
(47) 
Since the signal estimate maximizes the estimated signal power, this noise covariance estimate is biased and the noise level is likely underestimated.
Since we assume the noise independent across sensors, all offdiagonal elements of are known to be zero. With this constraint in mind, we modify (47) as
(48) 
The estimate (48) is demanding as for all the complexvalued observations in , we obtain estimates of the noise variance. Note that the estimate in (47) is not invertible whereas the diagonal constraint in (48) leads to a nonsingular estimate of with high probability (it is singular only if an element of is ). As a result, the expression for that is used for estimating in (33) is likely invertible.
On the other hand, an overestimate of the noise is easily obtained by assuming which is equivalent to setting in (48), resulting in
(49) 
or
(50) 
This can be shown be the maximum likelihood estimate for no sources () or very low power sources.
Iv SBL Algorithm
The complete SBL algorithm is summarized in Table I. The same algorithm is valid for Noise Cases I, II, and III (at high SNR). Given the observed , we iteratively update (12) by using the current and . The is computed directly as the numerical inverse of . For updating , we use (33). For the initialization of we use the conventional beamformer (CBF) power
(51) 
Based on the corresponding noise case, (43), (44), (45), or (48) is used to estimate . The noise is initialized using (43), (44), (45), or (48) with , which provides an over estimate of the noise variance.
The convergence rate measures the relative change in estimated total source power,
(52) 
The algorithm stops when and the output is the active set (see (11)) from which all source parameters are computed.
V Examples
When the noise process is stationary and only one source is active, the peak of the CBF spectrum provides the optimal estimate of the source location. With heteroscedastic noise this may not hold true. We propose to use a robust CBF (Sec.VB) or find the DOA with SBL, which takes the heteroscedastic noise into account (Sec. VC). With multiple sources and heteroscedastic noise we use SBL (Sec. VD).
Va Simulation setup
In the analysis of seismic data the noise for each snapshot was observed to be lognormal distributed [52]. Noise has also been modeled with extreme distributions [53]. In the simulations here, the noise follows a normaluniform hierarchical model. The noise is complex zeromean Gaussian with the noise standard deviation uniformly distributed over two decades, i.e., , where is the uniform distribution. Three noise cases are simulated: (a) Case I: constant noise standard deviation over snapshots and sensors, (b) Case II: standard deviation changes across snapshots with , and (c) Case III: standard deviation changes across both snapshots and sensors with .
VB Single DOA with CBF and MUSIC
The single source is located at . The complex source amplitude is stochastic and there is additive heteroscedastic Gaussian noise with SNR variation from to dB. The sensor array has elements with half wavelength spacing. We process snapshots. The angle space is divided into a grid (). The singlesnapshot array signaltonoise ratio (SNR) is
(53) 
We first compute the beam spectra using CBF (24), CBF2 and CBFPhase both (26) but with different weighting (23). When the noise is homoscedastic (constant standard deviation), the beam spectra for the three processors behave similarly (first row in Fig. 2). For heteroscedastic noise CBF2 and CBFPhase give much better discrimination between signal to noise, see Fig. 2 middle (Noise Case II) and bottom row (Noise Case III).
The root mean squared error (RMSE) of the DOA estimates over runs with random noise realizations is used for evaluating the algorithm
RMSE  (54) 
where is the true DOA and is the estimated DOA for the th source when sources are present.
The SNR curves (Fig. 3) demonstrate increased robustness of CBF2 and CBFPhase, failing 20 dB later. Due to the heteroscedastic noise, MUSIC performs worse than CBF for a single source. CBFPhase diverges at an SNR 15–20 dB later than CBF for Noise Cases II and III.
VC Single DOA estimation with SBL
We use the following SBL methods, with update (33) and
 SBL:

Solves Case I using standard SBL, with from (43).
 SBL2:

Solves Case II, with for each snapshot from (44).
 SBL3:

Solves Case III, with from (48).
In addition to these methods we also use basis pursuit (BP) as implemented in [31].
For Noise Case I, all the methods fail near the same SNR of dB, Fig. 4a. When the noise is heteroscedastic across snapshots, CBF, BP, and SBL fail early. Since both SBL2 and SBL3 correctly model the noise, they perform the best for Noise Case II, Fig. 4b. CBFPhase also performs well. For heteroscedastic Noise Case III, SBL3 fails the last and as before CBFPhase also performs well. SBL3 fails 15 dB later than any other method. This demonstrates the usefulness of accurate noise models in DOA processing.
The histograms (Fig. 5) of the DOA (location of peak in power spectrum) at SNR dB for Noise Case III demonstrate that when the heteroscedastic noise is accounted for, the histograms concentrate near the true DOA ().
An example statistic of the heteroscedastic noise standard deviation is shown in Fig. 6. The standard deviation for each sensor is either or . SBL3 estimates the standard deviation from (48). Average noise in Fig. 6b resembles well the true noise (Fig. 6a) whereas the sample estimate (Fig. 6c) has high variance. Fig. 6d plots the mean across simulations and snapshots of the estimated standard deviation. On average, the noise estimate is close to the true noise.
VD Three DOA estimation with SBL
Now consider three sources located at with power dB, see Fig. 7. Relative to the single source case, the CBFPhase performs significantly worse than SBL3, as the sources at and are both in the CBF main lobe.
The localization ability of the algorithms can also be gauged from the histogram of the top three peaks, see Fig. 8. Since SBL3 accounts for the heteroscedastic Noise Case III, its histogram is concentrated near the true DOAs.
VD1 Noise estimate convergence
The performance of the SBL methods is strongly related to the accuracy of the noise estimates. For the th snapshot, the true noise power (Noise Case II) is
(55) 
The estimated (44) deviates from (55) randomly. We also consider the noise estimate (45) based on the EM method. Fig. 9 compares the two noise estimates for noise generated with . The evolution of the histograms of the relative noise variance with iterations is in Fig. 9b. The mean of the ratio of is close to 1 for SBL2 but is much lower for the EM noise estimate (45), this likely causes the SBL2 (EM noise, (45)) to fails 5 dB earlier.
VD2 Number of snapshots
The RMSE versus number of snapshots for Noise Case III (Fig. 10) shows that SBL3 is most accurate with CBFPhase following.
VD3 Noise distribution
For Noise Case III, we are just using one observation to estimate the standard deviation for SBL3 (48). Thus the estimates are not good, see Fig. 11. The true noise standard deviation is generated from , see Fig. 11a. The distribution of the deviation from the true standard deviation (Fig. 11c) is wellcentered (mean 0.007).
Vi Conclusion
Likelihood based methods for single and multiple DOA estimation from longterm observations corrupted by nonstationary additive noise are discussed. In such a setting, the DOA estimators for a stationary noise model perform poorly. Therefore a heteroscedastic Gaussian noise model is introduced where the noise variance varies across sensors and snapshots. We develop Sparse Bayesian Learning (SBL) approaches to estimate source powers, source DOAs, and the noise variance parameters.
Simulations show that normalizing the array data magnitude (such that only the phase information is retained) is simple and useful for estimating a single source in heteroscedastic Gaussian noise. For the estimation of (multiple) closely spaced sources, a problem specific SBL approach gives a much lower RMSE in the DOA estimates.
References
 [1] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
 [2] C. M. Bishop. Pattern recognition and machine learning. Springer, 2006.
 [3] P. J. Huber. Robust statistics. Springer, 2011.
 [4] R. Maronna, D. Martin, and V. Yohai. Robust statistics. John Wiley, Chichester., 2006.
 [5] R. F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation. Econometrica, pages 987–1007, 1982.
 [6] T.H. Thai, R. Cogranne, and F Retraint. Camera model identification based on the heteroscedastic noise model. IEEE Trans. Image Proc., 23(1):250–263, 2014.
 [7] M. Viberg, P. Stoica, and B. Ottersten. Maximum likelihood array processing in spatially correlated noise fields using parameterized signals. IEEE Trans Signal Proc., 45(4):996–1004, 1997.
 [8] C. E. Chen, F. Lorenzelli, R.E Hudson, and K. Yao. Stochastic maximumlikelihood doa estimation in the presence of unknown nonuniform noise. IEEE Trans. Signal Proc., 56(7):3038–3044, 2008.
 [9] T Li and A. Nehorai. Maximum likelihood direction finding in spatially colored noise fields using sparse sensor arrays. IEEE Trans Signal Proc., 59(3):1048–1062, 2011.
 [10] H. Cox. Line array performance when the signal coherence is spatially dependent. J Acoust. Soc. Am., 54(6):1743–1746, 1973.
 [11] A. Paulraj and T. Kailath. Direction of arrival estimation by eigenstructure methods with imperfect spatial coherence of wave fronts. J Acoust. Soc. Am., 83(3):1034–1040, 1988.
 [12] R. Lefort, R. Emmetière, S. Bourmani, G. Real, and A. Drémeau. Subantenna processing for coherence loss in underwater direction of arrival estimation. J Acoust. Soc. Am., 142(4):2143–2154, 2017.
 [13] N.M Law, C. D. Mackay, and J.E Baldwin. Lucky imaging: high angular resolution imaging in the visible from the ground. Astronomy & Astrophysics, 446(2):739–745, 2006.
 [14] H. Ge and I. P. Kirsteins. Lucky ranging with towed arrays in underwater environments subject to nonstationary spatial coherence loss. In IEEE Intl. Conf. Acoust., Speech and Signal Proc. (ICASSP), pages 3156–3160, 2016.
 [15] P. Gerstoft, M. C. Fehler, and K. G. Sabra. When katrina hit california. Geophys. Res. Lett., 33(17), 2006.
 [16] P. Gerstoft and P. D. Bromirski. “Weather bomb” induced seismic signals. Science, 353(6302):869–870, 2016.
 [17] P. Roux, K. G. Sabra, P. Gerstoft, W. A. Kuperman, and M. C. Fehler. Pwaves from crosscorrelation of seismic noise. Geophys. Res. Lett., 32(19), 2005.
 [18] N. Harmon, P. Gerstoft, C. A. Rychert, G. A. Abers, M. Salas de La Cruz, and K. M. Fischer. Phase velocities from seismic noise using beamforming and cross correlation in Costa Rica and Nicaragua. Geophy. Res. Lett., 35(19), 2008.
 [19] M. Landès, F. Hubans, N. M Shapiro, A. Paul, and M. Campillo. Origin of deep ocean microseisms by using teleseismic body waves. J. Geophys Res: Solid Earth, 115(B5), 2010.
 [20] Z. Zhan, S. Ni, D. V Helmberger, and R. W. Clayton. Retrieval of mohoreflected shear wave arrivals from ambient seismic noise. Geophys. J. Int., 182(1):408–420, 2010.
 [21] C. Weemstra, W. Westra, R. Snieder, and L. Boschi. On estimating attenuation from the amplitude of the spectrally whitened ambient seismic field. Geophys. J. Int., 197(3):1770–1788, Apr 2014.
 [22] P. Gerstoft, W. S. Hodgkiss, M. Siderius, C.F. Huang, and C.H. Harrison. Passive fathometer processing. J. Acoust. Soc. Am., 123(3):1297–1305, 2008.
 [23] K.G. Sabra and D.R. Dowling. Blind deconvolution in ocean waveguides using artificial time reversal. J. Acoust. Soc. Am., 116(1):262–271, 2004.
 [24] K.G. Sabra, HC. Song, and D.R. Dowling. Raybased blind deconvolution in ocean sound channels. J. Acoust. Soc. Am., 127(2):EL42–EL47, 2010.
 [25] O. Schwartz and S. Gannot. Speaker tracking using recursive em algorithms. IEEE Trans. Audio, Speech, Language Proc., 22(2):392–402, 2014.
 [26] Y Dorfan and S Gannot. Treebased recursive expectationmaximization algorithm for localization of acoustic sources. IEEE Trans. Audio, Speech and Lang. Proc., 23(10):1692–1703, 2015.
 [27] H.L. Van Trees. Optimum Array Processing, chapter 1–10. WileyInterscience, New York, 2002.
 [28] D. Malioutov, M. Çetin, and A. S. Willsky. A sparse signal reconstruction perspective for source localization with sensor arrays. IEEE Trans. Signal Process., 53(8):3010–3022, 2005.
 [29] A. Xenaki, P. Gerstoft, and K. Mosegaard. Compressive beamforming. J. Acoust. Soc. Am., 136(1):260–271, 2014.
 [30] D. P. Wipf and B.D. Rao. An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Proc., 55(7):3704–3716, 2007.
 [31] P. Gerstoft, A. Xenaki, and C.F. Mecklenbräuker. Multiple and single snapshot compressive beamforming. J. Acoust. Soc. Am., 138(4):2003–2014, 2015.
 [32] P. Gerstoft, C. F. Mecklenbräuker, A. Xenaki, and S. Nannuru. Multisnapshot sparse Bayesian learning for DOA. IEEE Signal Proc. Lett., 23(10):1469–1473, 2016.
 [33] S. Nannuru, K.L Gemba, P. Gerstoft, W.S. Hodgkiss, and C.F. Mecklenbräuker. Sparse Bayesian learning with uncertainty models and multiple dictionaries. arXiv:1704.00436, 2017.
 [34] P. Stoica and P. Babu. SPICE and LIKES: Two hyperparameterfree methods for sparseparameter estimation. Signal Proc., 92(7):1580–1590, 2012.
 [35] D. P. Wipf and S. Nagarajan. Beamforming using the relevance vector machine. In Proc. 24th Int. Conf. Machine Learning, New York, NY, USA, 2007.
 [36] D. P. Wipf and B.D. Rao. Sparse Bayesian learning for basis selection. IEEE Trans. Signal Proc, 52(8):2153–2164, 2004.
 [37] Z. Zhang and B. D Rao. Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning. IEEE J Sel. Topics Signal Proc.,, 5(5):912–926, 2011.
 [38] Z.M. Liu, Z.T. Huang, and Y.Y. Zhou. An efficient maximum likelihood method for directionofarrival estimation via sparse Bayesian learning. IEEE Trans. Wireless Comm., 11(10):1–11, Oct. 2012.
 [39] JA. Zhang, Z. Chen, P. Cheng, and X. Huang. Multiplemeasurement vector based implementation for singlemeasurement vector sparse Bayesian learning with reduced complexity. Signal Proc., 118:153–158, 2016.
 [40] R. Giri and B. Rao. Type I and type II Bayesian methods for sparse signal recovery using scale mixtures. IEEE Trans Signal Proc. Signal Proc., 64(13):3418–3428, July 2016.
 [41] J.F. Böhme. Sourceparameter estimation by approximate maximum likelihood and nonlinear regression. IEEE J. Oceanic Eng., 10(3):206–212, 1985.
 [42] A.G. Jaffer. Maximum likelihood direction finding of stochastic sources: A separable solution. In IEEE Int. Conf. on Acoust., Speech, and Sig. Proc. (ICASSP88), volume 5, pages 2893–2896, 1988.
 [43] P. Stoica and A. Nehorai. On the concentrated stochastic likelihood function in array processing. Circuits Syst. Signal Proc., 14(5):669–674, 1995.
 [44] Johann F Böhme. Estimation of spectral parameters of correlated signals in wavefields. Signal Proc., 11(4):329–337, 1986.
 [45] H. Krim and M. Viberg. Two decades of array signal processing research: the parametric approach. IEEE Signal Proc. Mag., 13(4):67–94, 1996.
 [46] P. Stoica, B. Ottersten, M. Viberg, and R.L. Moses. Maximum likelihood array processing for stochastic coherent sources. IEEE Trans. Signal Proc., 44(1):96–105, 1996.
 [47] M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Machine Learning Research, 1:211–244, 2001.
 [48] R.L. Burden, J.D. Faires, and A.M. Burden. Numerical analysis. Cengage Learning, 2016. (chapter 2.2).
 [49] D. Kraus. Approximative MaximumLikelihoodSchätzung und verwandte Verfahren zur Ortung und Signalschätzung mit Sensorgruppen. Shaker Verlag, Aachen, Germany, 1993.
 [50] G. Bienvenu and L. Kopp. Optimality of high resolution array processing using the eigensystem approach. IEEE Trans on Acous., Speech, Signal Proc., 31(5):1235–1248, Oct 1983.
 [51] Z Zhang, TP Jung, S. Makeig, Zhouyue P, and BD. Rao. Spatiotemporal sparse Bayesian learning with applications to compressed sensing of multichannel physiological signals. IEEE Trans. Neural Syst. and Rehab. Eng., 22(6):1186–1197, Nov 2014.
 [52] N. Riahi and P. Gerstoft. The seismic traffic footprint: Tracking trains, aircraft, and cars seismically. Geophys. Res. Lett., 42(8):2674–2681, 2015.
 [53] E. Ollila. Multichannel sparse recovery of complexvalued signals using Huber’s criterion. In Int. Workshop on Compressed Sensing Theory and Appl. to Radar, Sonar, and Remote Sensing, Pisa, Italy, June 2015.