An Optimization View of MUSIC and Its Extension to Missing Data

An Optimization View of MUSIC and
Its Extension to Missing Data

Shuang Li, Hassan Mansour, and Michael B. Wakin SL and MBW are with the Department of Electrical Engineering, Colorado School of Mines, Golden, CO 80401. Email: {shuangli,mwakin}@mines.edu. HM is with Mitsubishi Electric Research Laboratories, Cambridge, MA 02139. Email: mansour@merl.com.
Abstract

One of the classical approaches for estimating the frequencies and damping factors in a spectrally sparse signal is the MUltiple SIgnal Classification (MUSIC) algorithm, which exploits the low-rank structure of an autocorrelation matrix. Low-rank matrices have also received considerable attention recently in the context of optimization algorithms with partial observations. In this work, we offer a novel optimization-based perspective on the classical MUSIC algorithm that could lead to future developments and understanding. In particular, we propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to a certain nuclear norm minimization (NNM) problem, and we show that this algorithm is in fact equivalent to MUSIC itself. Building on this connection, we also extend the classical MUSIC algorithm to the missing data case. We provide exact recovery guarantees for our proposed algorithms and quantify how the sample complexity depends on the true spectral parameters. Simulation results also indicate that the proposed algorithms significantly outperform some relevant existing methods in frequency estimation of damped exponentials.

Spectral estimation, MUSIC algorithm, nuclear norm minimization, optimization, low-rank matrix completion.

I Introduction

In this paper, we consider the problem of identifying the frequencies and damping factors contained in a spectrally sparse signal, namely, a superposition of a few complex sinusoids with damping, either from a complete set of uniform samples (which we refer to as full observations) or from a random set of partial uniform samples (which we refer to as the missing data case). This kind of signal arises in many applications, such as nuclear magnetic resonance spectroscopy [1, 2], radar imaging [3, 4] and modal analysis [5, 6]. It is well known that the frequencies and damping factors can be identified by the classical spectrum estimation approaches, such as Prony’s method [7], the Matrix Pencil method [8], and the MUltiple SIgnal Classification (MUSIC) algorithm [9, 10], when full observations are available. However, in many real-world applications, obtaining such full observations with high speed uniform sampling is of high cost and technically prohibitive. Lower-rate, nonuniform sampling can be an appealing alternative [11] and results in the partial observations (missing data) discussed in this work.

The MUSIC algorithm, which is widely used in signal processing [12, 13], was first proposed by Schmidt as an improvement to Pisarenko’s method [9]. MUSIC exploits the low-rank structure of an autocorrelation matrix, which is divided into the noise subspace and signal subspace via an eigenvalue decomposition. The spectral parameters are then identified by searching for the zeros of a noise-space correlation function [14]. The MUSIC algorithm can be used either for spectral analysis of one signal (the single measurement vector, or SMV, problem) or for multiple measurement vector (MMV) problems involving joint sparse frequency estimation [15]. However, a limitation of these classical spectral estimation methods is that they are not compatible with the random sampling or compression protocols that can be used to reduce the front-end sampling burden. One recent work [16] does adapt the MUSIC algorithm to the setting with noisy missing data, the authors provide asymptotic theoretical guarantees on the performance of a singular value decomposition (SVD) on the noisy partially observed data matrix. In contrast, in this work we consider two settings—the (noiseless and noisy) full observation case and the noiseless missing data case—and establish non-asymptotic theoretical guarantees for our proposed algorithms.

We focus on both the SMV and MMV settings in this paper. Samples of the spectrally sparse vector-valued signal (MMV setting) considered in this work can be arranged into a low-rank matrix while samples of the spectrally sparse scalar-valued signal (SMV setting) can be used to form a Hankel matrix, which is also a low-rank matrix. Low-rank matrices have received considerable attention recently in the context of optimization algorithms with partial observations. In particular, low-rank matrix recovery from missing data appears in many practical problems such as matrix completion [17, 18], low-rank approximation [19, 20] system identification [21, 22], and image denoising [23, 24]. A common approach for recovering a low-rank matrix is known as rank minimization. However, rank minimization problems are, in general, NP-hard. Fortunately, a popular heuristic of rank minimization problems, nuclear norm minimization (NNM), performs very well in low-rank matrix recovery when certain conditions on the measurement system are satisfied [17]. Recently, it has been shown that NNM for low-rank matrix recovery can be viewed as a special case of atomic norm minimization (ANM) when the atoms are composed of rank one matrices [25, 26]. ANM is a general optimization framework for decomposing structured signals and matrices into sparse combinations of continuously-parameterized atoms from some dictionary, and one of the primary successes of ANM has been in solving the line spectrum estimation problem in both the complete and missing data cases. Most of the theory for ANM in line spectrum estimation has relied on insight gained from analyzing the dual solution to the ANM problem. However, as far as we know, the general ANM (not NNM) considered in many existing works can only handle frequency estimation in undamped sinusoids [25, 27, 15, 28].

The fact that NNM is a special case of ANM suggests that ANM-type dual analysis can also be used for NNM. In particular, in this paper, we propose an algorithm for spectral estimation that involves searching for the peaks of the dual polynomial corresponding to the NNM problem. We name this algorithm NN-MUSIC (nuclear norm minimization view of MUSIC), and we highlight the fact that in the full observation case, NN-MUSIC is in fact equivalent to MUSIC itself. We believe this offers a novel optimization-based perspective on the MUSIC algorithm that could lead to future developments and understanding. We also provide one such development in this paper: unlike classical MUSIC, the NN-MUSIC algorithm can be naturally generalized to the missing data case, and so we also propose and analyze such a Missing Data MUSIC (MD-MUSIC) algorithm in this paper. Both NN-MUSIC and MD-MUSIC can deal with damped sinusoids. Our simulations also illustrate the advantage of these two proposed algorithms over ANM in frequency estimation of damped sinusoids.

Using our analytical framework, we also provide exact recovery guarantees for both NN-MUSIC and MD-MUSIC. For NN-MUSIC, our theorem indicates that we can perfectly identify the spectral parameters by searching for the locations in the damping-frequency plane where the -norm of the dual polynomial achieves , as long as the true spectral parameters are distinct from each other and the number of uniform samples is larger than the number of spectral parameters. For MD-MUSIC, our theorem shows that we can perfectly identify the spectral parameters with high probability by searching for the locations in the damping-frequency plane where the -norm of the dual polynomial achieves if the number of random samples is sufficiently large, the true spectral parameters are distinct from each other, and the number of uniform samples (from which the random samples are drawn) is larger than the number of spectral parameters. Moreover, we quantify how the sample complexity depends on the true spectral parameters.

The remainder of this paper is organized as follows. In Section II, we introduce both the SMV and MMV settings considered in this paper. In Section III, we review the classical MUSIC algorithm as well as its variants. In Section IV, we offer a novel optimization-based perspective on the MUSIC algorithm by highlighting the fact that the proposed NN-MUSIC algorithm is equivalent to MUSIC in the full observation case. We also generalize it to the missing data case and propose the MD-MUSIC algorithm to support the idea that this connection between NNM and MUSIC could lead to future developments and understanding. The proofs for theoretical guarantees are presented in Section V. In Section VI, we explore the recovery performance of the proposed NN-MUSIC and MD-MUSIC algorithms with numerical simulations. Finally, we conclude this work and discuss future directions in Section VII.

Ii Signal Models

We are interested in identifying the frequencies and damping factors contained in a spectrally sparse signal, which can be a scalar-valued signal in the SMV setting or a vector-valued signal in the MMV setting. We first introduce the SMV and MMV settings that are considered in this work. Throughout this work, we use superscript “” to denote row vectors, and superscripts and to denote transpose and conjugate transpose, respectively.

Ii-a Single Measurement Vector (SMV) setting

In the SMV setting, a scalar-valued, continuous-time signal is assumed to have the form

(1)

where , , and are the unknown coefficients, damping ratios, frequency parameters, and additive observation noise, respectively. Such signals appear in many applications, such as radar, sonar, and communications. Without loss of generality, we assume the frequencies belong to the interval , the damping ratios belong to the interval , the coefficients , and .

Ii-B Multiple Measurement Vector (MMV) setting

In the MMV setting, we consider a vector-valued signal , which is a superposition of damped sinusoids with additive observation noise . More precisely,

(2)

with , and being the complex coefficient, frequency, and damping factor, respectively. Here, is a normalized vector () that can be viewed as the mode shape in modal analysis problems [5, 6].

Suppose we take uniform samples and arrange as the -th row of a data matrix . Define and as the noiseless data matrix and observation noise matrix, respectively. Then, we have

(3)

with111Note that we abbreviate to when .

(4)

and

In addition, we define , , and .

Let , and denote the -th column of , and , respectively. It can be seen that

(5)

where with being the -th entry of . In this model, the observed data consists of observed length- signals, each comprised of damped sinusoids. The signals share the same set of unknown frequencies and damping factors, but each has a unique set of coefficients.

Iii Prior Work

In this section, we review the classical MUSIC algorithm [9, 10] as well as its two variants, Damped MUSIC (DMUSIC) [29] and MUSIC adapted to missing data with Gaussian white noise (denoted as MN-MUSIC) [16].

Iii-a MUltiple SIgnal Classification (MUSIC) algorithm

Iii-A1 SMV MUSIC via autocorrelation matrix

By sampling the scalar-valued, continuous-time signal , defined in (1), at equally spaced times, one can define a vector as

(6)

which has the autocorrelation matrix

The classical MUSIC algorithm aims to identify the unknown frequencies by constructing (and then decomposing) an estimate of the autocorrelation matrix without damping, namely, in the case where all  [30]. This requires . Specifically, consider a full set of uniform observations with , for some . Then, the following sample autocorrelation matrix can be used to approximate :

(7)

Let denote the orthonormal eigenvectors of . In particular, suppose (signal space) and (noise space) are associated with the largest eigenvalues and the smallest eigenvalues of , respectively. Then, we summarize the classical MUSIC algorithm in Algorithm 1.

1:procedure Input()
2:     compute the autocorrelation matrix as in (7) and its eigenvectors
3:     compute the pseudospectrum: , where is defined in (4) with
4:     localize the largest local maxima of pseudospectrum to get
5:     return
6:end procedure
Algorithm 1 MUSIC

The intuition behind the MUSIC algorithm comes from the fact that, as a consequence of the scalar-valued signal model in (1), the vector-valued signal in (6) can be expressed as

with

where is defined in (4) with . Then, the autocorrelation matrix becomes

if is uncorrelated with . Here, is the autocorrelation matrix of and denotes the identity matrix. Note that the coefficients may be uncorrelated ( is diagonal) or may contain completely correlated pairs ( is singular). We are interested in the first case, namely, is diagonal, and positive definite since , for .222As is stated in [10], in general, will be “merely” positive definite to reflect the arbitrary degrees of pair-wise correlations occurring between the coefficients. On the other hand, the rank of is when all the frequencies are distinct and . It follows that the rank of is . Let denote the non-increasing eigenvalues of . Then, we have

As a consequence, the determinant of is

which implies that

where is the -th non-increasing eigenvalue of . Denoting as the -th eigenvector of corresponding to eigenvalue , we have

(8)

Replacing into the above equation (8), we have

when , or equivalently, . Then, , which is defined in (4), is orthogonal to (columns of ), when . Therefore, we can identify the frequencies by localizing the peaks of the pseudospectrum .

Iii-A2 SMV MUSIC via Hankel matrix

As an alternative to the above autocorrelation matrix, a certain Hankel matrix can also be used in the MUSIC algorithm [14].333Indeed, Hankel structure has been widely used in a variety of algorithms for spectral estimation in the literature [31, 32, 33, 34]. In particular, from the same full set of uniform observations with , one can formulate the Hankel matrix

(9)

for some some positive integers and satisfying . Then define the noise-space correlation function and imaging function as

with

as defined in (4). Here, spans the noise subspace and contains the left singular vectors of corresponding to the smallest singular values. The frequencies can then be estimated by identifying the local minima of the noise-space correlation function or the local maxima of the imaging function .

Note that the sample autocorrelation matrix in (7) and the Hankel matrix in (9) are related by

Thus, the eigenvectors of are the same as the left singular vectors of up to a unitary transform. Therefore, the MUSIC algorithm based on the autocorrelation matrix and the Hankel matrix are equivalent since the imaging function is equivalent to the pseudospectrum in Algorithm 1.

Iii-A3 MMV MUSIC via data matrix

The MUSIC algorithm is also widely used in MMV problems [35, 36, 37]. Given a multiple measurement matrix (see Section III-C), one can directly compute an SVD of to obtain the noise space from the left singular vectors of and then identify the frequency parameters by localizing the peaks of the imaging function. In particular, denote

as an SVD of the data matrix . For the same reason, one can estimate the frequencies by finding the peaks of the imaging function

Iii-B Damped MUSIC (DMUSIC)

In the general model of (1), the complex-valued sinusoids are damped and decay over time. For this more general case, the DMUSIC algorithm introduced in [29] aims to estimate both the frequencies and damping ratios directly using the rank-deficiency and Hankel properties of (9). Similar to classical MUSIC, DMUSIC involves constructing the noise subspace matrix by computing an SVD of the Hankel matrix . Then, the pairs are identified by finding the peaks of the imaging function

(10)

with defined in (4).

The intuition behind DMUSIC is that the Hankel matrix in (9) can be rewritten as

where is a Hankel matrix formulated with , and is a diagonal matrix with diagonal entries being the scaled coefficients . Precisely, the -th diagonal entry of is . and are Vandermonde matrices defined as

with

(11)

Note that we add a subscript “” in (11) to distinguish from . When and all the pairs are distinct, and are full rank. Then, is of rank . Now, consider the case when there is no noise, i.e., . Denote an SVD of as

One can show that the range spaces of , , and are all equal when there is no noise. Then, is orthogonal to the columns of when . If noise exists, the orthogonal relationship between and no longer holds. However, one can identify all the pairs by finding the peaks of the imaging function defined in (10), that is, searching for that are most nearly orthogonal to the noise space .

Iii-C MN-MUSIC for missing and noisy data

The classical MUSIC algorithm has also been adapted to the missing data case with Gaussian white noise (denoted as MN-MUSIC) for applications such as direction of arrival (DOA) estimation [16]. The authors consider the MMV setting as introduced in Section III-A3. More precisely, consider an observed matrix , where is defined in (5) and repeated as follows

with since undamped signals are considered in [16].

Assume we partially observe the entries of with i.i.d. Bernoulli randomly sampled locations . Let be the projection matrix of on the index set , i.e

Then in MN-MUSIC, an SVD is directly performed on to get the signal space matrix , which contains the left singular vectors of corresponding to the largest singular values. Finally, the frequencies are estimated by finding the peaks of

which is essentially same as in Sections III-A and III-B.

Iv Main Results

In this section we outline a connection between MUSIC and low-rank matrix optimization using nuclear norm minimization (NNM), and based on this connection we propose an extension of MUSIC that is appropriate for the missing data case. Our interest in NNM here is specifically due to its connection with MUSIC. There are, of course, alternative low-rank optimization problems that can also be used for spectral analysis. Among these, atomic norm minimization (ANM) has been proposed and analyzed for solving the undamped line spectrum estimation problem in both the full and missing data cases [15, 27]. Moreover, a low-rank Hankel matrix recovery problem has recently been considered for damped spectral analysis [38]; that work involves solving the NNM (12) and (16) with an extra Hankel constraint on . While these alternative frameworks have some benefits, we believe that our work sheds light on a more fundamental problem, given the considerable attention that MUSIC has received over the last several decades. This understanding may lead to new developments for MUSIC and other optimization algorithms for spectral analysis in the future.

Iv-a Optimization connection to MUSIC in the full data case

In this section, we consider both the SMV and MMV settings. Given a set of uniform samples from the signal model (1) in the SMV setting and the data matrix or its noisy version  (3) in the MMV setting, our goal is to identify the frequencies and damping factors . Note that in the SMV setting, we can construct a Hankel matrix as in (9). As is shown in Section III-B, this Hankel matrix can be decomposed as and is of rank when there is no noise. One can observe that both and are low-rank matrices and have the same type of decompositions. Therefore, the analysis on can also be applied to , which implies that the algorithms we build using in the MMV scenario also work for the SMV scenario.

Assume that is given and , note that in (3) is low rank. Inspired by the low-rank property of and the dual analysis that is commonly used in atomic norm minimization (ANM) [25, 27], let us consider the following nuclear norm minimization (NNM)

(12)

Although this problem has a trivial solution (namely, ), it is interesting because we can compute the corresponding dual feasible point , which is a solution of the dual problem, via the Lagrange function of (12) and thus identify the frequencies and damping factors that are contained in the spectrally sparse signal in (2). In particular, the Lagrange function is given as

with being the dual variable. is defined as the real inner product, i.e.,

with denoting the trace of a matrix. Then, the subgradient of with respect to is

where is the subdifferential of the nuclear norm and given as

since . Note that is a truncated SVD of with , and . We can also construct a

by letting according to the zero-gradient condition in the Karush-Kuhn-Tucker (KKT) conditions [39]. Finally, we have

(13)

by choosing . Given a dual feasible point , we define the dual polynomial as

(14)

which is inspired by the dual analysis in ANM. The following theorem guarantees that we can identify the true ’s and ’s by localizing the places where achieves 1. Moreover, it also indicates that one does not need a separation condition in this full data noiseless setting. (In some previous work on optimization-based spectral estimation [15], one needs the minimum separation , which is defined in Theorem IV.2, to be on the order of even for the full data noiseless setting.)

Theorem IV.1.

Let denote the set of the true damping factor and frequency pairs, i.e.,

Given the full data matrix as in (3), compute its truncated SVD . With as in (13), the dual polynomial defined in (14) satisfies

if , all the pairs in are distinct, and is of rank .

The proof of Theorem IV.1 is given in Section V-A.

Based on the above analysis, we propose the following algorithm, named NN-MUSIC (nuclear norm minimization view of MUSIC algorithm), to estimate the damping factors  and frequencies of the damped sinusoids from the data matrix . Note that the step with the highest computational cost is the SVD step, and this needs to be performed only once.

1:procedure Input()
2:     compute truncated SVD of :
3:     form the dual feasible point:
4:     form the dual polynomial:
5:     localize the places where to get
6:     return
7:end procedure
Algorithm 2 NN-MUSIC

Note that Algorithm 2 is essentially equivalent to the MUSIC (in the undamped case) and DMUSIC (in the damped case) algorithms outlined in Section III. This is due to the fact that . When there is no noise, the DMUSIC algorithm and its variants characterize the spectral parameters by locating the zeros of a noise-space correlation function or the peaks of the imaging function, and the proposed NN-MUSIC algorithm identifies the spectral parameters by localizing the pairs where achieves . While MUSIC has been classically understood from an algebraic perspective (owing to its closed form), we believe the derivation of NN-MUSIC offers a novel optimization-based perspective on MUSIC that could lead to future developments and understanding.

We also stress that this connection to MUSIC is unique to NNM and does not apply in general to ANM. In particular, the connection arises specifically because the dual feasible point of NNM induces a dual polynomial that satisfies . On the other hand, the dual feasible point of ANM formulations does not admit the structure , in general.

Finally, consider the case when the given data matrix contains some additive white Gaussian noise, i.e.,

with denoting the measurement noise. Then, we can solve the following nuclear norm denoising program

(15)

where is a regularization parameter. As is shown in the simulation, we can estimate the pairs by localizing the peaks of the norm of the corresponding dual polynomial. We leave the robust performance analysis of this framework for future work.

Iv-B Extension to the missing data case

Unlike the classical formulation of MUSIC, the optimization-based derivation of NN-MUSIC allows it to be naturally extended to the missing data case. In particular, assume that we partially observe the entries of the full data matrix in (3) with uniformly random sampled locations . Let be the projection matrix of on the index set , i.e

Notice that recovering the missing entries of the matrix reduces to a matrix completion problem [17], commonly formulated via the following NNM

(16)

which can be solved by the corresponding semi-definite program (SDP)

(17)

The dual problem of (17) is given by

Therefore, we can define the dual polynomial as

where is the dual solution. Similar to Theorem IV.1, the following Theorem guarantees that we can identify the true ’s and ’s by localizing the places where achieves 1.

Theorem IV.2.

Suppose is a data matrix of the form (3) and all the pairs are distinct. Given the uniformly partial random observed data matrix , suppose

for some numerical constants and . Here,

denotes an incoherence parameter with

is a function of , , and defined as

with being a constant and

and denotes the minimum separation between true frequencies, where is the wrap-around distance on the unit circle. Then, is the unique solution of (16) with probability at least . Given the recovered full data matrix , Theorem IV.1 guarantees the perfect recovery of all the pairs.

The proof for Theorem IV.2 relies on some of the results in paper [40]. However, those results do not extend directly to the damped exponential case. Rather than use a certain incoherence property as in [40], we incorporate the damping ratios into the signal and develop theoretical guarantees that explicitly depend on the parameters, i.e., damping ratios and minimum frequency separation. In particular, we explicitly bound the minimal singular value of with the function by exploiting the Vandermonde structure of  [41], instead of giving an incoherence property depending on just the minimal singular value of as in [40]. Note that is on the order of when there is no damping (i.e., ) and the frequencies are well separated (). Please see Section V-B for details.

Note that the set is chosen uniformly at random from all subsets of with a given cardinality . Since the columns of are assumed to be normalized, we have according to the definition of . In particular, when all the entries of have magnitude and when has a row containing all ’s with all other rows being . Due to the normalized columns in , we also have that . Therefore, it can be seen from the above theorem that when there is no damping (or only light damping, i.e., is close to 1) and the frequencies are well separated, and is close to , we can bound by a constant and thus the number of measurements needed for perfect recovery is comparable to best case bounds for rank- matrix completion. Specifically, state-of-the-art bounds [42] for low-rank matrix completion from uniform random samples involve a dependence on a certain coherence parameter (equal to the maximum leverage score of the matrix); when this coherence parameter is small, the sample complexity is (up to logarithmic factors). The significance to Theorem IV.2 is that the sample complexity is not stated in terms of the matrix coherence; rather, the dependence on the damping ratios and minimum frequency separation is explicitly revealed. We also note that this is quite distinct from the work [40], in which the theoretical guarantees are built on a different incoherence property rather than the explicit parameters such as frequencies.

Inspired by Algorithm 2 and the above analysis, we propose the following Missing Data MUSIC algorithm, named MD-MUSIC, to identify the damping factors and frequencies  from the partially observed data matrix . Note that any off-the-shelf SDP solver could be used to solve the SDP in (17).

1:procedure Input()
2:     compute and by solving the SDP (17)
3:     form the dual polynomial:
4:     localize the places where to get
5:     return and
6:end procedure
Algorithm 3 MD-MUSIC

Finally, we note that one could also consider an alternative approach wherein one first solves the NNM problem in (17) and then uses Algorithm 2 to identify the ’s and ’s using . Interestingly, however, as we demonstrate in Section VI, it is sometimes possible with MD-MUSIC to perfectly recover the ’s and ’s even when exact recovery of fails. This implies that MD-MUSIC is actually more powerful than the alternative approach mentioned above. We leave the analysis of this phenomenon (parameter recovery without exact data matrix recovery) for future work.

V Proofs

V-a Proof for Theorem iv.1

Denote a truncated SVD of as . Since , we have