Fixed-point algorithms for frequency estimation and structured low rank approximation

# Fixed-point algorithms for frequency estimation and structured low rank approximation

## Abstract

We develop fixed-point algorithms for the approximation of structured matrices with rank penalties. In particular we use these fixed-point algorithms for making approximations by sums of exponentials, or frequency estimation. For the basic formulation of the fixed-point algorithm we show that it converges to the minimum of the convex envelope of the original objective function along with its structured matrix constraint. It often happens that this solution agrees with the solution to the original minimization problem, and we provide a simple criterium for when this is true. We also provide more general fixed-point algorithms that can be used to treat the problems of making weighted approximations by sums of exponentials given equally or unequally spaced sampling. We apply the method to the case of missing data, although optimal convergence of the fixed-point algorithm is not guaranteed in this case. However, it turns out that the method often gives perfect reconstruction (up to machine precision) in such cases. We also discuss multidimensional extensions, and illustrate how the proposed algorithms can be used to recover sums of exponentials in several variables, but when samples are available only along a curve.

Keywords: Fixed-point algorithms, Frequency estimation. General domain Hankel matrices, Fenchel conjugate, Convex envelope.

Mathematics Subject Classification: 15B05, 65K10, 41A29, 41A63.

## 1 Introduction

We consider the non-convex problem

 argmin  A∈HJF,σ0(A)=argmin  A∈Hσ20rank(A)+∥A−F∥2, (1)

where is an -matrix, any linear subspace, is a parameter and the norm refers to the Frobenius norm. The convex envelope of is then given by

 Rσ0(A)+∥A−F∥2, (2)

where

 Rσ0(A)=∑jσ20−(max(σ0−σj(A),0))2,

see [19, 20] for details. We provide a fixed point algorithm which is guaranteed to solve the problem

 argmin  A∈HRτ(A)+q∥A−F∥2, (3)

for any and . It turns out that the solution to (3) often coincides with the solution to the non-convex problem (1) for , and we provide a simple condition to verify if this has happened. To be more precise, below is a compilation of Theorem 5.1 and Theorem 5.2 in the particular case for the non-linear map defined by (23).

###### Theorem 1.1.

Given any starting point the Picard iteration converges to a fixed point . Moreover, the orthogonal projection is unique and

 A∘=2F−PH(W∘),

is the unique solution to (3). Moreover, has the property that also solves

and if has no singular values equal to , then is the solution to the non-convex problem (1) with .

To provide more flexibility in choosing the objective functional, we also consider the problem of minimizing

 Rτ(A)+q∥MA−h∥2V,A∈H, (4)

where is a linear operator into any given linear space. We give conditions under which the fixed points of a certain non-linear operator coincide with the global minimum of this problem.

We apply these algorithms to the problem of approximation by sparse sums of exponentials. Let be the function that we wish to approximate, and assume that it is sampled at points . The problem can then be phrased as

 argmin  {ck},{ζk},Kσ20K+J∑j=1μj∣∣ ∣∣K∑k=1cke2πixjζk−f(xj)∣∣ ∣∣2,ck,ζk∈C,μj∈R+. (5)

Here, the parameter is a parameter that penalizes the number of exponential functions. The weights can be used to control the confidence levels in the samples .

This approximation problem is closely related to the problem of frequency estimation, i.e., the detection of the (complex) frequencies above. Once the frequencies are known, the remaining approximation problem becomes linear and easy to solve by means of least squares. The literature on frequency estimation is vast. The first approach was made already in 1795 . Notable methods that are commonly used are ESPRIT , MUSIC  and matrix pencil methods . We refer to  for an overview from a signal processing perspective. From an analysis point of view, the AAK-theory developed in [1, 2, 3] deals with closely related problems, and approximation algorithms based on these results were developed in [9, 14]. A competitive algorithm in the case of fixed was recently developed in .

To approach this problem, we let denote the set of Hankel matrices, i.e., if , then there is a generating function such that . Hankel matrices are thus constant along the anti-diagonals. By the Kronecker theorem there is a connection between a sums of exponential functions and Hankel matrices with rank , namely that if is a linear combination of exponential functions sampled at equally spaced points, then the Hankel matrix generated by these samples will generically have rank . For more details, see e.g. [4, 5, 18, 23].

In the special case with equally spaced sampling, i.e. , and

 μj={jif 1≤j≤N+12N+2−jif N+1

the problem (5) (with being the Hankel matrix generated from ) is equivalent with (2). The triangular weight above comes from counting the number of elements in the Hankel matrix along the anti-diagonals. Theorem 1.1 thus provide approximate solutions to this problem, which often turn out to solve the original problem (5) as well, which in this setting is the equivalent of (1).

Clearly, for many purposes it would be more natural to use the regular norm in the approximation, i.e., to use constant values of . Also more general setups of weights can be of interest (for instance in the case of missing data). We show how the fixed-point algorithm designed for (4) can be used in this case.

The basic algorithms for frequency estimation (or approximations by sums of exponentials) require that the data is provided (completely) at equally spaced samples. In many situations, this assumption does not hold (in particular concerning sampling in two or more variables). A traditional method designed for computing periodograms for unequally spaced sampling is the Lomb-Scargle method [21, 26]. The results are typically far from satisfactory, cf. . For a review of frequency estimation methods for unequally spaced sampling, see . In this work, we will use similar ideas as in  for treating the unequally spaced sampling. Given a set of sample points , let be a matrix that interpolates between equally spaced points and the points . This means for instance that the rows of have sum one. The action of its adjoint is sometimes referred to as anterpolation. We study the problem

 argmin  A(j,k)=a(j+k)σ20rank(A)+∥(f−IX(a))∥2. (7)

using the algorithm for (4), and we also modify (7) to deal with the corresponding weighted case.

Finally, we discuss how the methods can be used in the multidimensional case. Let be an open bounded connected set and a function on which is assumed to be of the form

 f(x)=K∑k=1ckeζk⋅x, (8)

where , and is a low number. In  we introduce a new class of operators called general domain truncated correlation operators, whose “symbols” are functions on , and prove a Kroncker-type theorem for these operators. In particular it follows that their rank is if the symbol is of the form (8), and the converse situation is also completely clarified. In  it is explained how these operators can be discretized giving rise to a class of structured matrices called “general domain Hankel matrices”, (see Section 7 for more details). Letting be such a class of structured matrices, it turns out that (3) is a natural setting for solving the problem: Given noisy measurements of at (possibly unequally spaced) points in , find the function . The connected problem of finding the values and is addressed e.g. in .

The paper is organized as follows: In Section 2 we provide some basic ingredients for the coming sections, in particular we introduce the singular value functional calculus and connected Lipschitz estimates. Our solution of (3) involves an intermediate more intricate objective functional, which is introduced in Section 3, where we also investigate the structure of its Fenchel conjugates. Moreover our solution to (3) first solves a dual problem, which is introduced and studied in Section 4. Consequently the key operator is introduced in Section 4. At this point, we have the ingredients necessary for addressing the main problem (3). In Section 5.1 we recapitulate our main objective and discuss how various choices of and affect the objective functional and also its relation to the original problem (1). An extended version of Theorem 1.1 is finally given in Section 5.2, where we also present our algorithm for solving (3). The more general version of the algorithm and corresponding theory is given in Section 5.3. The remainder of the paper is devoted to frequency estimation and numerical examples. Section 6 considers the one-dimensional problem (7), whereas the problem of retrieving functions of the form (8) is considered in Section 7. This section also contains a more detailed description of the general domain Hankel matrices which underlies the method. Numerical examples are finally given in Section 8 and we end with a summary of our conclusions in Section 9.

## 2 Preliminaries

### 2.1 Fenchel conjugates and convex optimization

We recall that the Fenchel conjugate of a function is given by

 f∗(y)=maxx⟨x,y⟩−f(x).

For positive functions , is always convex, and if is also convex, then , see Theorem 11.1 in . If is not convex, then equals the convex envelope of . In particular, note that we always have .

Suppose now that is a convex function on some finite dimensional linear space , and let be a linear subspace. Consider the problem of finding

 minf(x) (9) subj. to x∈M.

If then the value in (9) is clearly larger than or equal to the value

 minxf(x)−⟨y,x⟩=−f∗(y). (10)

Suppose now that we can find a value (not necessarily unique) which minimizes

 minf∗(y) (11) subj. to y∈M⊥.

Since is convex, it is easy to see that the corresponding value in (10) equals that of (9).

### 2.2 The singular value functional calculus

Let be given and let denote the Euclidean space of -matrices, equipped with the Frobenius norm. We will need the singular value functional calculus , defined as follows. Let be a real valued function on such that and let be a matrix with singular value decomposition . We then define

 Sf(A)=Udiag({f(σj(A))})V∗.

The key property we will use is that

 ∥Sf(A)−Sf(B)∥≤∥f∥Lip∥A−B∥, (12)

where refers to the Lipschitz constant for . In other words, Lipschitz functions give rise to Lipschitz functional calculus, with the same bound.

## 3 Convex envelopes

For a given matrix , let

 Tσ0(A)=∑jmax(σ2j(A)−σ20,0),

and

 Rσ0(A)=∑jσ20−(max(σ0−σj(A),0))2.

For a proof of the following theorem, see [19, 20].

###### Theorem 3.1.

Let be a fixed matrix, and let

 (13)

The Fenchel conjugates of are then

 J∗F,σ0(X) =Tσ0(X2+F)−∥F∥2, (14) J∗∗F,σ0(A) =Rσ0(A)+∥A−F∥2. (15)

Let and let be conjugate exponents, i.e. such that .

###### Theorem 3.2.

Let

 KF,τ,q(A,B)=τ2rank(A)+p∥(A−B)∥2+q∥(B−F)∥2.

It’s Fenchel conjugate is then given by

 K∗F,τ,q(X,Y)=Tτ(X2−Y2q+F)+(q−1)∥∥∥Y2q−F∥∥∥2−q∥F∥2. (16)

and its convex envelope (Fenchel bi-conjugate) is given by

 K∗∗F,τ,q(A,B)=Rτ(A)+p∥A−B∥2+q∥B−F∥2, (17)
###### Proof.

We write simply when are clear from the context. Consider the Fenchel conjugate of

 K∗(X,Y)=maxA,B⟨A,X⟩+⟨B,Y⟩−K(A,B)= maxA⟨A,X⟩−τ2rankA+maxB(⟨B,Y⟩−p∥(A−B)∥2−q∥(B−F)∥2).

The right maximum is attained for

 B=pp+qA+qp+qF+12(p+q)Y.

After substituting in the Fenchel conjugate and some algebraic simplifications, we obtain

 K∗(X,Y) =maxA⟨A,X⟩−τ2rankA−pqp+q∥∥∥A−(Y2q+F)∥∥∥2+q∥∥∥Y2q+F∥∥∥2−q∥F∥2 =(maxA⟨A,X⟩−τ2rankA−∥∥∥A−(Y2q+F)∥∥∥2)+q∥∥∥Y2q+F∥∥∥2−q∥F∥2.

From (14) we see that the expression inside the first parenthesis can be written as

 Tτ(X2+Y2q+F)−∥∥∥Y2q+F∥∥∥2,

from which (16) follows.

Next we study the bi-conjugate

To simplify the computations we change coordinates

 ⎧⎨⎩C=X2+Y2q+FD=Y2q+F. (18)

for which

 {X=2(C−D)Y=2q(D−F).

The bi-conjugate can then be written as

 K∗∗(A,B) =maxC,D2⟨A,C−D⟩+2q⟨B,D−F⟩−K∗(2(C−D),2q(D−F)) =maxC,D  2⟨A,C−D⟩+2q⟨B,D−F⟩−Tτ(C)−(q−1)∥D∥2+q∥F∥2= =maxC(⟨A,2C⟩−Tτ(C))+maxD(2⟨D,qB−A⟩−(q−1)∥D∥2)+q∥F∥2+2q⟨B,F⟩.

The maximization over is straightforward and it will give a contribution for . Hence by (18), and

 K∗∗(A,B) +∥qB−A∥2q−1+q∥F∥2−2q⟨B,F⟩ =maxX(⟨A,X⟩−J∗qB−Aq−1,τ(X))+q−2(q−1)2∥qB−A∥2

We recognize the maximization part as a Fencel conjugate, and hence

 K∗∗(A,B) =Rτ(A)+∥∥A−qB−Aq−1∥∥+q−2(q−1)2∥qB−A∥2+2q−1⟨A,qB−A⟩+q∥F∥2−2q⟨B,F⟩

## 4 The dual optimization problem

Let be any fixed linear subspace. We consider the problem

 argmin  K∗∗(A,B)=Rτ(A)+p∥A−B∥2+q∥B−F∥2 (19) subj. to A=B,B∈H,

where . Let be the linear subspace of defined by the equations and . It is easy to see that is defined by

 (X,Y)∈M⊥⇔X+Y∈H⊥.

Indeed, is obvious and the other part can be established by noting that whereas the subspace defined by has dimension , (which thus equals ). Motivated by the results in Section 2.1, we now focus on the “dual problem” to (19), i.e.

 argmin  K∗(X,Y) (20) subj. to X+Y∈H⊥

Set

 sτ,q(σ)=max(min(σ,τ),σq). (21)

When there is no risk for confusion we will omit the subindices. Recall the singular value functional calculus introduced in Section 2.2.

###### Proof.

Since depends only on the singular values of , von-Neumann’s inequality  shows that the solution to the above problem will have the same singular vectors as , (see the proof of Theorem 3.1 in  for more details). Henceforth it will be of the form where is defined by

 s(σj(A))=minω(max(ω2−τ2,0)+1q−1(ω−σj(A))2), (22)

and by equating the subgradient of the above expression with respect to with zero, it is easily verified that this is solved by the function (21). ∎

Let denote the operator of orthogonal projection onto . We now introduce the (non-linear) operator which will play a key role in our solution of the original problem (3).

###### Definition 4.2.

Set

 BF,τ,q(W)=Ssτ,q(qF+PH⊥(W)), (23)

which we abbreviate when are clear from the context.

###### Theorem 4.3.

There exists a fixed point of such that the problem

 minimizeX,Y K∗(X,Y) (24) subject to X+Y∈H⊥.

is solved by

 X∘ =2p(PH(W∘)−F)+2PH⊥(W∘) (25) Y∘ =−2p(PH(W∘)−F).
###### Proof.

Since we can write

 X=2qH+H⊥1,Y=−2qH+H⊥2.

with and . By (16) the problem (24) becomes

 argminH,H⊥1,H⊥2K∗(2qH+H⊥1,−2qH+H⊥2)= argminH,H⊥1,H⊥2Tτ(2qH+H⊥12+−2qH+H⊥22q+F)+(q−1)∥∥ ∥∥−2qH+H⊥22q+F∥∥ ∥∥2.

Note that

 2qH+H⊥12+−2qH+H⊥22q=(q−1)H+H⊥12+H⊥22q=(q−1)H+H⊥,

where . The problem can thus be rewritten

 argminH,H⊥2,H⊥Tτ(F+(q−1)H+H⊥)+(q−1)∥∥ ∥∥F−H+H⊥22q∥∥ ∥∥2,

Since and , it follows that

 ∥∥ ∥∥F−H+H⊥22q∥∥ ∥∥2=∥F−H∥2+∥∥ ∥∥H⊥22q∥∥ ∥∥2,

and hence we conclude that the optimal is the zero matrix. Thus

 H⊥=H⊥12,

and the objective functional can be replaced by

 argminH,H⊥Tτ(F+(q−1)H+H⊥)+(q−1)∥H−F∥2.

Using the substitution it then becomes

 Tτ(W)+(q−1)∥∥ ∥∥Wq−1−(qF+H⊥q−1)∥∥ ∥∥2= Tτ(W)+1q−1∥∥W−(qF+PH⊥(W))∥∥2.

The minimization problem will thus be solved by

 W=argmin  W S(W,PH⊥W), (26)

where is the functional defined by

 S(W,V)=Tτ(W)+1q−1∥W−(qF+V)∥2.

Note that is convex since it is obtained from the convex function by affine changes of coordinates. It is also easy to see that

 limR→∞sup∥W∥+∥V∥>RS(W,V)=∞,

so it has at least one global minimum. Let be one such. Since , it is easy to see that . But then

 W∘=argmin  WS(W,V∘)=argmin  WS(W,PH⊥W∘),

 W∘=Ss(qF+PH⊥(W∘))=B(W∘).

It is also evident that is a solution to (26). The formulas in (25) now follows by tracing the changes of variables backwards. ∎

## 5 A fixed point algorithm

### 5.1 Discussion of the objective functional

To recapitulate our main objective, we wish to minimize

 σ20rank(A)+∥A−F∥2,A∈H, (27)

which we replace by its convex envelope

 Rσ0(A)+∥A−F∥2,A∈H, (28)

to achieve convexity. We will in this section show how to minimize

 argmin  A∈HRτ(A)+q∥A−F∥2, (29)

for any and . Note that the corresponding modification to the original objective functional (27), i.e.

 argmin  A∈Hτ2rank(A)+q∥A−F∥2, (30)

does not change the problem, since we may equivalently solve (27) with (to see this, multiply (30) with ). However, this is not the case with (29), for it is easy to show that

 1qRτ(A)=Rτ/√q(A/√q).

Thus (29) is equivalent with

 argmin  A∈HRσ0(A/√q)+∥A−F∥2, (31)

where , compare with (27) and (28). Note as well that

 Rτ(A)+q∥A−F∥2=Rτ(A)+∥A−F∥2+(q−1)∥A−F∥2,

by which we conclude that the objective functional in (29) is strictly convex with a unique minimizer . Moreover, the above expression is clearly less than or equal to

 τ2rank(A)+∥A−F∥2+(q−1)∥A−F∥2=τ2rank(A)+q∥A−F∥2,

so if it happens that

 Rτ(A∘)+q∥A∘−F∥2=τ2rank(A∘)+q∥A∘−F∥2, (32)

then is a solution to the original problem (30). However, by the definition of it follows that (32) is satisfied if and only if has no singular values in the interval .

### 5.2 The basic fixed-point algorithm

We now solve (29), using fixed points of the operator from the previous section. It is easy to see that these may not be unique. Despite that, we have the following.

###### Theorem 5.1.

The Picard iteration converges to a fixed point . Moreover, is unique and

 A∘=1q−1(qF−PH(W∘)),

is the unique solution to (29).

Before the presenting the proof, let us give a few more comments on the equivalent problem formulation (31). Note that

 limq→0+Rτ(A/√q)=τ2rank(A),

but that our method requires (in fact, the objective functional is no longer convex beyond ). However, choosing will yield slow convergence of the Picard iteration since then . On the other hand, the effect of choosing a large (for a fixed ) is that the term deviates further from . We have found that works well in practice.

###### Proof.

We first prove that the Picard iteration converges. Note that the sequence , is generated by the Picard iteration of the operator and the starting-point . Moreover . Since is continuous (in fact, it is non-expansive by (12)) it suffices to show that is convergent.

It is well known [15, 22] that for firmly non-expansive operators (in finite dimensional space) the Picard iteration converges to a fixed point as long as the set of fixed points of the operator is non-empty. By Theorem 4.3 there exists a fixed point of , and it clearly follows that is a fixed point of . It remains to show that is firmly non-expansive, as an operator on . An equivalent to firmly non-expansive is that is non-expansive , i.e.,

 ∥∥(2PH⊥B(V)−V)−(2PH⊥B(W)−W)∥∥≤∥V−W∥,∀V,W∈H⊥, (33)

which we will verify shortly. To this end, note that on we have

 2PH⊥B(V)=2PH⊥s(qF+V), (34)

and set

 s†(σ) =2s(σ)−σ=max(min(2σ,2τ),2qσ)−σ =max(min(σ,2τ−σ),(2q−1)σ).

This function is clearly Lipschitz continuous with Lipschitz constant equal to one. Setting and , (12) and (34) implies that

 ∥∥(2PH⊥B(V)−V)−(2PH⊥B(W)−W)∥∥2= ∥∥PH⊥((2s(~V)−V)−(2s(~W)−W))∥∥2= ∥∥PH⊥((2s(~V)−~V)−(2s(~W)−~W))∥∥2=∥∥PH⊥(s†(~V)−s†(~W))∥∥2≤ ∥∥s†(~V)−s†(~W)∥∥2≤∥∥~V−~W∥∥2=∥V−W∥,

which establishes (33), and hence the original Picard iteration converges to a fixed point of . For the uniqueness of , suppose that is another fixed point of and write , where and . Then

 ∥H−~H∥2+∥T−~T∥2=∥W∘−~W∘∥2=∥B(W∘)−B(~W∘)∥2= ∥s(qF+T)−s(qF+~T)∥2≤∥(qF+T)−(qF+~T)∥2=∥T−~T∥2,

where the last inequality is due to (12) and the fact that has Lipschitz constant equal to one. But then

 ∥H−~H∥2≤∥T−~T∥2−∥T−~T∥2=0,

implying that Finally, if denotes the solution to (19), then it is clear that solves (29). By the theory in Section 2.1 and Theorem 4.3, it follows that also is a solution to

 argmin  A,B Rτ(A)+p∥A−B∥2+q∥B−F∥2 (35) −⟨A,2p(PH(W∘)−F)+2PH⊥(W∘)⟩−⟨B,−2p(PH(W∘)−F)⟩.

where is as above. By fixing we find that

 B∘ =argmin  Bp∥A∘−B∥2+q∥B−F∥2−⟨B,−2p(PH(W∘)−F)⟩ (36) =F+1q(A∘−PH(W∘)).

But inserting the constraint (from (19)) in the above equation yields

 (q−1)A∘=qF−PH(W∘), (37)

as desired. The uniqueness of was argued before the proof, so it is complete. ∎

The matrix has several peculiar properties, as shown by the next theorem, (compare with (29) and note the absence of the condition below). Figure 2: Illustration of Theorem 5.2. The blue dots show σj(A),the the black circles σj(W), and the black line the level of τ. For j such that σj(W)>τ, it holds that σj(W)=σj(A). In this case σj(A)≈0 for the larger values of j, implying that the obtained result is in fact a solution to the non-convex problem (30) as well as a solution to the convex problem (31).
###### Theorem 5.2.

Let and be as in Theorem 5.1. Then solves

Moreover, if , then , where

 ⎧⎪⎨⎪⎩σj(A∘)=σj(W∘),if σj(W∘)>τ0≤σj(A∘)≤σj(W∘),if σj(W∘)=τσj(A∘)=0,if σj(W∘)<τ. (38)

In particular, if has no singular values equal to , then is the solution to the non-convex problem (30).

###### Proof.

By (36), minimizes (35) restricted to the subspace . By inserting this expression for in (36) and by a completion of squares, we get

 A∘=

Since only depend on the singular values of , it follows by von-Neumann’s inequality  that it is sufficient to consider the minimization of

 σj(A)=argmin  s−(max(τ−s,0))