Fixed-point algorithms for frequency estimation and structured low rank approximation
We develop fixed-point algorithms for the approximation of structured matrices with rank penalties. In particular we use these fixed-point algorithms for making approximations by sums of exponentials, or frequency estimation. For the basic formulation of the fixed-point algorithm we show that it converges to the minimum of the convex envelope of the original objective function along with its structured matrix constraint. It often happens that this solution agrees with the solution to the original minimization problem, and we provide a simple criterium for when this is true. We also provide more general fixed-point algorithms that can be used to treat the problems of making weighted approximations by sums of exponentials given equally or unequally spaced sampling. We apply the method to the case of missing data, although optimal convergence of the fixed-point algorithm is not guaranteed in this case. However, it turns out that the method often gives perfect reconstruction (up to machine precision) in such cases. We also discuss multidimensional extensions, and illustrate how the proposed algorithms can be used to recover sums of exponentials in several variables, but when samples are available only along a curve.
Keywords: Fixed-point algorithms, Frequency estimation. General domain Hankel matrices, Fenchel conjugate, Convex envelope.
Mathematics Subject Classification: 15B05, 65K10, 41A29, 41A63.
We consider the non-convex problem
where is an -matrix, any linear subspace, is a parameter and the norm refers to the Frobenius norm. The convex envelope of is then given by
for any and . It turns out that the solution to (3) often coincides with the solution to the non-convex problem (1) for , and we provide a simple condition to verify if this has happened. To be more precise, below is a compilation of Theorem 5.1 and Theorem 5.2 in the particular case for the non-linear map defined by (23).
To provide more flexibility in choosing the objective functional, we also consider the problem of minimizing
where is a linear operator into any given linear space. We give conditions under which the fixed points of a certain non-linear operator coincide with the global minimum of this problem.
We apply these algorithms to the problem of approximation by sparse sums of exponentials. Let be the function that we wish to approximate, and assume that it is sampled at points . The problem can then be phrased as
Here, the parameter is a parameter that penalizes the number of exponential functions. The weights can be used to control the confidence levels in the samples .
This approximation problem is closely related to the problem of frequency estimation, i.e., the detection of the (complex) frequencies above. Once the frequencies are known, the remaining approximation problem becomes linear and easy to solve by means of least squares. The literature on frequency estimation is vast. The first approach was made already in 1795 . Notable methods that are commonly used are ESPRIT , MUSIC  and matrix pencil methods . We refer to  for an overview from a signal processing perspective. From an analysis point of view, the AAK-theory developed in [1, 2, 3] deals with closely related problems, and approximation algorithms based on these results were developed in [9, 14]. A competitive algorithm in the case of fixed was recently developed in .
To approach this problem, we let denote the set of Hankel matrices, i.e., if , then there is a generating function such that . Hankel matrices are thus constant along the anti-diagonals. By the Kronecker theorem there is a connection between a sums of exponential functions and Hankel matrices with rank , namely that if is a linear combination of exponential functions sampled at equally spaced points, then the Hankel matrix generated by these samples will generically have rank . For more details, see e.g. [4, 5, 18, 23].
In the special case with equally spaced sampling, i.e. , and
the problem (5) (with being the Hankel matrix generated from ) is equivalent with (2). The triangular weight above comes from counting the number of elements in the Hankel matrix along the anti-diagonals. Theorem 1.1 thus provide approximate solutions to this problem, which often turn out to solve the original problem (5) as well, which in this setting is the equivalent of (1).
Clearly, for many purposes it would be more natural to use the regular norm in the approximation, i.e., to use constant values of . Also more general setups of weights can be of interest (for instance in the case of missing data). We show how the fixed-point algorithm designed for (4) can be used in this case.
The basic algorithms for frequency estimation (or approximations by sums of exponentials) require that the data is provided (completely) at equally spaced samples. In many situations, this assumption does not hold (in particular concerning sampling in two or more variables). A traditional method designed for computing periodograms for unequally spaced sampling is the Lomb-Scargle method [21, 26]. The results are typically far from satisfactory, cf. . For a review of frequency estimation methods for unequally spaced sampling, see . In this work, we will use similar ideas as in  for treating the unequally spaced sampling. Given a set of sample points , let be a matrix that interpolates between equally spaced points and the points . This means for instance that the rows of have sum one. The action of its adjoint is sometimes referred to as anterpolation. We study the problem
Finally, we discuss how the methods can be used in the multidimensional case. Let be an open bounded connected set and a function on which is assumed to be of the form
where , and is a low number. In  we introduce a new class of operators called general domain truncated correlation operators, whose “symbols” are functions on , and prove a Kroncker-type theorem for these operators. In particular it follows that their rank is if the symbol is of the form (8), and the converse situation is also completely clarified. In  it is explained how these operators can be discretized giving rise to a class of structured matrices called “general domain Hankel matrices”, (see Section 7 for more details). Letting be such a class of structured matrices, it turns out that (3) is a natural setting for solving the problem: Given noisy measurements of at (possibly unequally spaced) points in , find the function . The connected problem of finding the values and is addressed e.g. in .
The paper is organized as follows: In Section 2 we provide some basic ingredients for the coming sections, in particular we introduce the singular value functional calculus and connected Lipschitz estimates. Our solution of (3) involves an intermediate more intricate objective functional, which is introduced in Section 3, where we also investigate the structure of its Fenchel conjugates. Moreover our solution to (3) first solves a dual problem, which is introduced and studied in Section 4. Consequently the key operator is introduced in Section 4. At this point, we have the ingredients necessary for addressing the main problem (3). In Section 5.1 we recapitulate our main objective and discuss how various choices of and affect the objective functional and also its relation to the original problem (1). An extended version of Theorem 1.1 is finally given in Section 5.2, where we also present our algorithm for solving (3). The more general version of the algorithm and corresponding theory is given in Section 5.3. The remainder of the paper is devoted to frequency estimation and numerical examples. Section 6 considers the one-dimensional problem (7), whereas the problem of retrieving functions of the form (8) is considered in Section 7. This section also contains a more detailed description of the general domain Hankel matrices which underlies the method. Numerical examples are finally given in Section 8 and we end with a summary of our conclusions in Section 9.
2.1 Fenchel conjugates and convex optimization
We recall that the Fenchel conjugate of a function is given by
For positive functions , is always convex, and if is also convex, then , see Theorem 11.1 in . If is not convex, then equals the convex envelope of . In particular, note that we always have .
Suppose now that is a convex function on some finite dimensional linear space , and let be a linear subspace. Consider the problem of finding
If then the value in (9) is clearly larger than or equal to the value
Suppose now that we can find a value (not necessarily unique) which minimizes
2.2 The singular value functional calculus
Let be given and let denote the Euclidean space of -matrices, equipped with the Frobenius norm. We will need the singular value functional calculus , defined as follows. Let be a real valued function on such that and let be a matrix with singular value decomposition . We then define
The key property we will use is that
where refers to the Lipschitz constant for . In other words, Lipschitz functions give rise to Lipschitz functional calculus, with the same bound.
3 Convex envelopes
Let be a fixed matrix, and let
The Fenchel conjugates of are then
Let and let be conjugate exponents, i.e. such that .
It’s Fenchel conjugate is then given by
and its convex envelope (Fenchel bi-conjugate) is given by
We write simply when are clear from the context. Consider the Fenchel conjugate of
The right maximum is attained for
After substituting in the Fenchel conjugate and some algebraic simplifications, we obtain
From (14) we see that the expression inside the first parenthesis can be written as
from which (16) follows.
Next we study the bi-conjugate
To simplify the computations we change coordinates
The bi-conjugate can then be written as
The maximization over is straightforward and it will give a contribution for . Hence by (18), and
We recognize the maximization part as a Fencel conjugate, and hence
4 The dual optimization problem
Let be any fixed linear subspace. We consider the problem
where . Let be the linear subspace of defined by the equations and . It is easy to see that is defined by
Indeed, is obvious and the other part can be established by noting that whereas the subspace defined by has dimension , (which thus equals ). Motivated by the results in Section 2.1, we now focus on the “dual problem” to (19), i.e.
When there is no risk for confusion we will omit the subindices. Recall the singular value functional calculus introduced in Section 2.2.
Since depends only on the singular values of , von-Neumann’s inequality  shows that the solution to the above problem will have the same singular vectors as , (see the proof of Theorem 3.1 in  for more details). Henceforth it will be of the form where is defined by
and by equating the subgradient of the above expression with respect to with zero, it is easily verified that this is solved by the function (21). ∎
Let denote the operator of orthogonal projection onto . We now introduce the (non-linear) operator which will play a key role in our solution of the original problem (3).
which we abbreviate when are clear from the context.
There exists a fixed point of such that the problem
is solved by
Since we can write
where . The problem can thus be rewritten
Since and , it follows that
and hence we conclude that the optimal is the zero matrix. Thus
and the objective functional can be replaced by
Using the substitution it then becomes
The minimization problem will thus be solved by
where is the functional defined by
Note that is convex since it is obtained from the convex function by affine changes of coordinates. It is also easy to see that
so it has at least one global minimum. Let be one such. Since , it is easy to see that . But then
which by Lemma 4.1 read
5 A fixed point algorithm
5.1 Discussion of the objective functional
To recapitulate our main objective, we wish to minimize
which we replace by its convex envelope
to achieve convexity. We will in this section show how to minimize
for any and . Note that the corresponding modification to the original objective functional (27), i.e.
Thus (29) is equivalent with
by which we conclude that the objective functional in (29) is strictly convex with a unique minimizer . Moreover, the above expression is clearly less than or equal to
so if it happens that
5.2 The basic fixed-point algorithm
We now solve (29), using fixed points of the operator from the previous section. It is easy to see that these may not be unique. Despite that, we have the following.
The Picard iteration converges to a fixed point . Moreover, is unique and
is the unique solution to (29).
Before the presenting the proof, let us give a few more comments on the equivalent problem formulation (31). Note that
but that our method requires (in fact, the objective functional is no longer convex beyond ). However, choosing will yield slow convergence of the Picard iteration since then . On the other hand, the effect of choosing a large (for a fixed ) is that the term deviates further from . We have found that works well in practice.
We first prove that the Picard iteration converges. Note that the sequence , is generated by the Picard iteration of the operator and the starting-point . Moreover . Since is continuous (in fact, it is non-expansive by (12)) it suffices to show that is convergent.
It is well known [15, 22] that for firmly non-expansive operators (in finite dimensional space) the Picard iteration converges to a fixed point as long as the set of fixed points of the operator is non-empty. By Theorem 4.3 there exists a fixed point of , and it clearly follows that is a fixed point of . It remains to show that is firmly non-expansive, as an operator on . An equivalent to firmly non-expansive is that is non-expansive , i.e.,
which we will verify shortly. To this end, note that on we have
which establishes (33), and hence the original Picard iteration converges to a fixed point of . For the uniqueness of , suppose that is another fixed point of and write , where and . Then
where the last inequality is due to (12) and the fact that has Lipschitz constant equal to one. But then
where is as above. By fixing we find that
But inserting the constraint (from (19)) in the above equation yields
as desired. The uniqueness of was argued before the proof, so it is complete. ∎
The matrix has several peculiar properties, as shown by the next theorem, (compare with (29) and note the absence of the condition below).