Interpolation via weighted \ell_{1} minimization

Interpolation via weighted minimization

Holger Rauhut, Rachel Ward H. Rauhut is with RWTH Aachen University, Lehrstuhl C für Mathematik (Analysis), Templergraben 55, 52062 Aachen, Germany, rauhut@mathc.rwth-aachen.deR. Ward is with the Mathematics Department at the University of Texas at Austin, 2515 Speedway, Austin, Texas, 78712,

Functions of interest are often smooth and sparse in some sense, and both priors should be taken into account when interpolating sampled data. Classical linear interpolation methods are effective under strong regularity assumptions, but cannot incorporate nonlinear sparsity structure. At the same time, nonlinear methods such as minimization can reconstruct sparse functions from very few samples, but do not necessarily encourage smoothness. Here we show that weighted minimization effectively merges the two approaches, promoting both sparsity and smoothness in reconstruction. More precisely, we provide specific choices of weights in the objective to achieve approximation rates for functions with coefficient sequences in weighted spaces with . We consider implications of these results for spherical harmonic and polynomial interpolation, in the univariate and multivariate setting. Along the way, we extend concepts from compressive sensing such as the restricted isometry property and null space property to accommodate weighted sparse expansions; these developments should be of independent interest in the study of structured sparse approximations and continuous-time compressive sensing problems.

Key words: bounded orthonormal systems, compressive sensing, interpolation, weighted sparsity, minimization

Dedicated to the memory of Ted Odell

1 Introduction

This paper aims to merge classical smoothness-based methods for function interpolation with modern sparsity constraints and nonlinear reconstruction methods. We will focus on the classical interpolation problem, where given sampling points and associated function values, we wish to find a suitably well-behaved function agreeing with the data up to some error. Classically, “well-behaved” has been measured in terms of smoothness: the more derivatives a function has, the stronger the reconstruction rate obtained using linear methods. More recently, areas such as compressive sensing have focused on sparsity rather than smoothness as a measure of complexity. Results in compressive sensing imply that a function with sparse representation in a known basis can be reconstructed from a small number of suitably randomly distributed sampling points, using nonlinear techniques such as convex optimization or greedy methods. In reality, functions of interest may only be somewhat smooth and somewhat sparse. This is particularly apparent in high-dimensional problems, where sparse and low-degree tensor product expansions are often preferred according to the sparsity-of-effects principle, which states that most models are principally governed by main effects and low order interactions. In such situations, we might hope to synthesize smoothness and sparsity-based approaches.

Recall that the smoothness of a function tends to be reflected in the rapid decay of its Fourier series, and vice versa. Smoothness can then be viewed as a structured sparsity constraint, with low-order Fourier basis functions being more likely to contribute to the best -term approximation. We will demonstrate that such structured sparse expansions are imposed by weighted coefficient spaces in the range . Accordingly, we will use weighted minimization, a convex surrogate for weighted minimization with , as our reconstruction method of choice.

The contributions of this paper are as follows.

  1. Function approximation. We provide the first rigorous analysis for function interpolation using weighted minimization. We show that with the appropriate choice of weights, one obtains approximation rates for functions with coefficient sequences lying in weighted spaces with . In the high-dimensional setting, our rates are better than those possible by classical linear interpolation methods, and require only mild smoothness assumptions. For suitable choices of the weights, the number of sampling points required by weighted minimization to achieve a desired rate grows only linearly or even just logarithmically with the ambient dimension, rather than exponentially. We illustrate the improvement of weighted minimization compared to unweighted minimization on several specific examples, including spherical harmonic interpolation and tensorized Chebyshev and Legendre polynomial interpolation. We expect that the results for polynomial interpolation should have applications in uncertainty quantification [30] and, in particular, in the computation of generalized polynomial chaos expansions [16, 17].

  2. Weighted sparsity. In order to derive stable and robust recovery guarantees for weighted minimization, we generalize the notion of restricted isometry property in compressive sensing to the weighted restricted isometry property, and also develop notions of weighted sparsity which take into account prior information on the likelihood that any particular index contributes to the sparse expansion. These developments should be of independent interest as tools that can be used more generally for the analysis of structured sparsity models and continuous-time sparse approximation problems.

  3. Structured random matrices for compressive sensing. It is by now well established [9, 40, 36] that an -sparse trigonometric polynomial of maximal degree can be recovered efficiently from its values at sampling points drawn independently from the uniform measure on their domain. More generally, such near-optimal sparse recovery results hold whenever the underlying sparsity basis is uniformly bounded like the trigonometric system, so as to be incoherent with point samples [36]. However, these results become weak if the norms of the functions in the orthonormal system grow with the maximal degree . If this growth is not too sharp, one may apply a preconditioning technique proposed in [38] to transform the system into a uniformly bounded one. However, often these preconditioning techniques are not enough to make the basis functions uniformly bounded. Here, we show that as long as lower-degree polynomials are preferred in the sparse expansions under consideration, one may still derive sparse recovery guarantees using weighted minimization with weights which grow at least as quickly as the norms of the functions they correspond to.

We will assume throughout that the sampling points for interpolation are drawn from a suitable probability distribution, and we focus only on the setting where interpolation points are chosen in advance, independent of the target function.

1.1 Organization

The remainder of the paper is organized as follows. In Sections 1.2 and 1.3, we introduce weighted spaces and discuss how they promote smoothness and sparsity. In Section 1.4 we state two of the main interpolation results, and in Section 1.5 we introduce the concept of weighted restricted isometry property for a linear map. Section 1.6 discusses previous work on weighted minimization, and in Section 1.7 we compare our main results to those possible with linear reconstruction methods. In Section 2 we discuss the implications of our main results for spherical harmonic and tensorized polynomial bases, and provide a numerical illustration. We further analyze concepts pertaining to weighted sparsity in Section 3, and in Section 4 we elaborate on the weighted restricted isometry property and weighted null space property. In Section 5 we show that matrices arising from orthonormal systems have the weighted restricted isometry property as long as the weights are matched to the norms of the function system, and we finish in Section 6 by presenting our main results on interpolation via weighted minimization.

1.2 Weighted sparsity

We will work with coefficient sequences indexed by a set which may be finite or countably infinite. We will associate to a vector of weights the weighted spaces


Also central to our analysis will be the weighted -“norm”,

which counts the squared weights of the non-zero entries of . We also define the weighted cardinality of a set to be Since by assumption, we always have , the cardinality of . When , these weighted norms reproduce the standard norms, in which case we use the standard -notation . The exponent in the definition of the spaces is somewhat uncommon but turns out to be the most convenient definition for our purposes. For instance, the exponent must scale this way in order to apply a Cauchy-Schwarz inequality to weighted -sparse vectors:

Consequently, error bounds we present will be in terms of the norms scaled as such; for we have weighted error bounds, for we have unweighted error bounds.

For and for a subset of the index set , we define as the restriction of to . For , the error of best weighted -term approximation of the vector is defined as


Unlike unweighted best -term approximations, weighted approximations of vectors are not straightforward to compute in general. Nevertheless, we will show in Section 3 how to approximate using a quantity that can easily computed from by sorting and thresholding.

1.3 Weighted spaces for smoothness and sparsity

The weighted coefficient spaces introduced in the previous section can be used to define weighted function spaces. For a bounded domain , let , , be a sequence of functions indexed by the set which are orthonormal with respect to the probability measure on that is,

We will call the orthogonalization measure associated to the system . The function spaces we consider are the weighted quasi-normed spaces,

again with implicitly assumed. The best -term approximation to is the function


where is the set realizing the weighted best -term approximation of , and the best weighted -term approximation error is


The following Stechkin-type estimate, described in more detail in Section 3, can be used to bound the best -term approximation of a vector by an appropriate weighted vector norm:


This estimate illustrates how a small norm for supports small sparse approximation error. Conditions of the form are somewhat natural in the context of weighted sparse approximations, as those indices with weights cannot possibly contribute to best weighted -term approximations. This means that we can usually replace a countably-infinite set by the finite subset corresponding to indices with weights (or, for technical reasons, ), if such a finite set exists.

1.4 Interpolation via weighted minimization

In treating the interpolation problem, we first assume that the index set is finite with . Given sampling points and we can write the vector of sample values succinctly in matrix form as , where is the sampling matrix with entries

Better sets of interpolation points are usually associated with better condition number for the sampling matrix . In our theorems, the sampling points are drawn independently from the orthogonalization measure associated to the orthonormal system ; as a consequence, the random matrix , properly normalized, is the identity matrix in expectation.

We will consider the setting where the number of samples is smaller than the ambient dimension , in which case there are infinitely many functions which interpolate the given data. From within this infinite set, we would like to pick out the function of minimal quasi-norm . However, this minimization problem only becomes tractable once whence the quasi-norm becomes a norm. As a convex relaxation to the weighted quasi-norm , we consider for interpolation the function whose coefficient vector is the solution of the weighted minimization program

The equality constraint in the minimization ensures that the function interpolates at the points , that is, , . Let us give the following result on interpolation via weighted minimization with respect to .

Theorem 1.1.

Suppose is an orthonormal system of finite size , and consider weights . For , fix a number of samples


and suppose that , are sampling points drawn i.i.d. from the orthogonalization measure associated to . With probability exceeding , the following holds for all functions : given samples , , let be the solution of

and set . Then the following error rates are satisfied:

Here is the best -term approximation error of defined in (4). The constants , and are universal, independent of everything else.

This interpolation theorem is nonstandard in two respects: the number of samples required to achieve a prescribed rate scales only logarithmically with the size of the system, and the error guarantees are given by best -term approximations in weighted coefficient norms.

The constraint on the weights allows us to bound the norm by the weighted coefficient norm: for a function ,

and so if with is the best -term approximation to in the norm, then by the Stechkin-type estimate (5) with we have

By choosing weights so that , one may also arrive at bounds of the form , reflecting how steeper weights encourage more smoothness. We do not pursue such a direction in this paper, but this may be interesting for future research.

In Section 6 we will prove a more general version of Theorem 1.1 showing robustness of weighted minimization to noisy samples, . Using this robustness to noise, we will be able to treat the case where the index set is countably infinite, by regarding the values as noisy samples of a finite-dimensional approximation to . For example, this will allow us to show the following result.

Theorem 1.2.

Suppose is an orthonormal system, consider weights , and for a parameter , let where . Consider a number of samples

Consider a fixed function with . Draw sampling points , independently from the orthogonalization measure associated to . Let be the sampling matrix with entries . Let and be such that . From samples , , let be the solution of

and set . Then with probability exceeding ,

Above, is an absolute constant and are constants which depend only on the distortion .

Several remarks should be made about Theorem 1.2.

  1. The minimization problem in Theorem 1.2 requires knowledge of, or at least an estimate of, the tail bound . It is possible to avoid this using greedy or iterative methods; see [23] for one such method, a weighted version of the iterative hard thresholding algorithm [3]. In subsequent corollaries of this result, we will assume exact knowledge of the tail bound, for simplicity of presentation.

  2. If the size of is polynomial in , then the number of samples reduces to to achieve reconstruction with probability .

1.5 Weighted restricted isometry property

One of the main tools we use in the proofs of Theorems 1.1 and 1.2 is the weighted restricted isometry property (-RIP) for a linear map , which generalizes the concept of restricted isometry property in compressive sensing.

Definition 1.3 (-RIP constants).

For , , and weight , the -RIP constant associated to is the smallest number for which


for all with .

For weights , the -RIP reduces to the standard RIP, as introduced in [9, 8]. For general weights , the -RIP is a weaker assumption for a matrix than the standard RIP, as it requires the map to act as a near-isometry on a smaller set.

Example 1.4.

Consider weights of the general form with . We may then take , as even single indices have weighted cardinality exceeding . Observe that if then is supported on an index set of (unweighted) cardinality at most . Following the approach of [1], see also [2, 4], taking a union bound and applying covering arguments, one may argue that an i.i.d. subgaussian random matrix has the -RIP with high probability once

This is a smaller number of measurements than the lower bound required for the unweighted RIP. This observation should be of independent interest, but we focus in this paper on random matrices formed by sampling orthonormal systems.

1.6 Related work on weighted minimization

Weighted minimization has been analyzed previously in the compressive sensing literature. Weighted minimization with weights was introduced independently in the papers [24, 45, 44] and extended further in [22]. The paper [21] seems to be the first to provide conditions under which weighted minimization is stable and robust under weaker sufficient conditions than the analogous conditions for standard minimization for general weights. Improved sufficient conditions were recently provided for this setting in [47]. Recovery guarantees were also provided in [35], also with a focus on function interpolation with applications to polynomial chaos expansions. Still, the analysis in all of these works is based on the standard restricted isometry property and this limits the extent to which their recovery guarantees improve on those for unweighted minimization.

Weighted minimization has also been considered under probabilistic models. In [46], the vector indices are partitioned into two sets, and indices on each set have different probabilities , of being nonzero; the weights are constant on each of the two sets. The papers [24, 25] provide further analysis in this setting where the entries of the unknown vector fall into two or more sets, each with a different probability of being nonzero. Finally, the paper [33] considers a full Bayesian model, where certain probabilities are associated with each component of the signal in such a way that the probabilities vary in a “continuous” manner across the indices. All of these works take a Grassmann angle approach, and the analysis is thus restricted to the setting of Gaussian matrices and to noiseless measurements.

1.7 Comparison with classical interpolation results

Although weighted minimization was recently investigated empirically in [18, 37, 35] for multivariate polynomial interpolation in the context of polynomial chaos expansions, weighted spaces, for , are nonstandard in the interpolation literature. More standard spaces are the weighted spaces (see e.g. [29, 34])


where is the tensorized Fourier basis on the torus . Note that here we refer to weighted spaces, as opposed to the weighted norm introduced in (1) which reduces to the unweighted norm when .

For the choice of weights , , on these spaces coincide with the Sobolev spaces of functions with derivatives in . Optimal interpolation rates for these Sobolev spaces are obtained using smooth and localized kernels (as opposed to polynomials). For example, from equispaced points on the -dimensional torus with mesh size , [34] derives error estimates of the form

Writing out this error rate in terms of the number of samples , one obtains

The dependence of the exponent in means that we face the curse of dimension.

Such behavior in may be alleviated for linear interpolation by passing to Sobolev spaces of mixed smoothness, which are endowed with the norm

Sampling on a sparse grid with sampling points and reconstructing via Smolyak’s algorithm leads to the error bound

see [43] and [7, Theorem 6.11]. A matching lower bound has recently been proved in [12]. This means that the dependence in can be avoided at the term . However, the additional logarithmic term still exhibits exponential scaling in .

In contrast, Theorem 6.4 implies that weighted minimization gives the rate

Here, the dependence in is hidden in which in turn depends on the weight . We discuss two reasonable choices which correspond to the ones leading to Sobolev and mixed Sobolev spaces in the -case described above. Let first


This means that the approximation error decays like

Hence, the approximation error depends only polynomially on the dimension , and we avoided the curse of dimension by working with the stronger norm and using nonlinear reconstruction.

We can further reduce the dependence in by using the weights

In fact, the set is a hyperbolic cross. Its size can be bounded as

see [28, Proof of Theorem 4.9]. Therefore, and the error bound resulting from Theorem 6.4 is

The dependence in the dimension is only logarithmic for this choice of weight function. Hence, passing from (mixed) Sobolev spaces to weighted -spaces with on the Fourier coefficients, and from linear interpolation to nonlinear reconstruction, may lead to a significant improvement in the error rates and may avoid the curse of dimension.

2 Examples

In this section we consider several examples and demonstrate how Theorem 1.2 gives rise to various sampling theorems for polynomial and spherical harmonic interpolation. One could derive similar results in weighted spaces using Theorem 6.4.

2.1 Spherical harmonic interpolation

Figure 1: An illustration of i.i.d. samples from various spherical measures. . The distribution (c) is the most incoherent with respect to the spherical harmonic basis.

The spherical harmonics form an orthonormal system for square-integrable functions on the sphere , and serve as a higher-dimensional analog of the univariate trigonometric basis. They are orthogonal with respect to the uniform spherical measure. In spherical coordinates , for , the orthogonality reads


The spherical harmonics are bounded according to , and this bound is realized at the poles of the sphere, . As shown in [39, 6], one can precondition the spherical harmonics to transform them into a system with smaller uniform bound, orthonormal with respect to a different measure. For example, the preconditioned function system

normalized by the proper constant, is orthonormal on the sphere with respect to the measure by virtue of (9). The are more uniformly bounded than the spherical harmonics ; as noted in [27],

for a universal constant . A sharper preconditioning estimate was shown in [6] for the system


Normalized properly, this system is orthonormal on the sphere with respect to the measure , which is nonstandard and illustrated in Figure 1. This system obeys the uniform bound


with a universal constant.

We consider implications of Theorem 1.2 for interpolation with spherical harmonic expansions. We state a result in the setting where sampling points are drawn from the measure , but similar results (albeit with steeper weights) can be obtained for sampling from the measures and .

Corollary 2.1 (Interpolation with spherical harmonics).

Consider the preconditioned spherical harmonics , and associated orthogonalization measure . Fix weights and index set of size , and fix a number of samples

Consider a fixed function and let . Draw i.i.d. from , and consider sample values , Then with probability exceeding , the function formed from the solution of the weighted minimization program

satisfies the error bounds

It is informative to compare these results with previously available bounds for unweighted minimization. Using the same sampling distribution and number of basis elements , existing bounds for unweighted minimization (see [6]) require a number of samples

to achieve an error estimate of the form (see [39] for more details). That is, significantly more measurements are required to achieve the same reconstruction rate in . (A rate for is not available for unweighted -minimization). However, stronger assumptions on are required in the sense that the result above requires the weighted best -term approximation error to be small while the bound from [39] works with the unweighted best -term approximation error. Expressed differently, our result requires more smoothness which is in line with the general philosophy of this paper.

2.2 Tensorized polynomial interpolation

The tensorized trigonometric polynomials on are given by

with . These functions are orthonormal with respect to the tensorized uniform measure. Because this system is uniformly bounded, Theorem 1.2 applies with constant weights . Nevertheless, higher weights promote smoother reconstructions.

Other tensorized polynomial bases of interests are not uniformly bounded, but we can get reconstruction guarantees by considering weighted minimization with properly chosen weights.

2.2.1 Chebyshev polynomials

Consider the tensorized Chebyshev polynomials on :


where . The Chebyshev polynomials form a basis for the real algebraic polynomials on , and are orthonormal with respect to the tensorized Chebyshev measure


Since we have . Although the tensorized Chebyshev polynomials are uniformly bounded, the bound may therefore scale exponentially in . This motivates us to apply Theorem 1.2 with weights


noting that . (More generally, one could also work with weights of the form , where tends to infinity as .) Such weights encourage both sparse and low order tensor products of Chebyshev polynomials. The subset of indices

forms a hyperbolic cross. As argued in [28, Proof of Theorem 4.9], see also [15], the size of a hyperbolic cross can be bounded according to

Corollary 2.2.

Consider the tensorized Chebyshev polynomial basis for , and weights as in (14). Let , and let . Fix a number of samples


Consider a function and sampling points , drawn i.i.d. from the tensorized Chebyshev measure on . Let be the sampling matrix with entries . From samples , , let be the solution of

and set . Then with probability exceeding ,

Above, and are universal constants.

Note that with the stated estimate of , satisfies (15) once

This means that the required number of samples above grows only logarithmically with the ambient dimension as opposed to exponentially, as required for classical interpolation bounds using linear reconstruction methods.

2.2.2 Legendre polynomials

Consider now the tensorized Legendre polynomials on :


where is the univariate orthonormal Legendre polynomial of degree . The Legendre polynomials form a basis for the real algebraic polynomials on , and are orthonormal with respect to the tensorized uniform measure on . Since we have


and we may apply Theorem 1.2 with hyperbolic cross weights as in Corollary 2.2. In doing so, we arrive at the following result.

Corollary 2.3.

Consider the tensorized Legendre polynomial basis and weights as in (14) and with , , and as in Corollary 2.2. Consider a function and suppose that , , are drawn i.i.d. from the tensorized uniform measure on . Let be the associated sampling matrix with entries . From samples , , let be the solution of

and set . Then with probability exceeding ,


Above, and are universal constants.

Although the univariate orthonormal Legendre polynomials are not uniformly bounded on , they can be transformed into a bounded orthonormal system by considering the weight

and recalling Theorem 7.3.3 from [42] which states that, for all ,


Then the preconditioned system is orthonormal with respect to the Chebyshev measure, and is uniformly bounded on with constant . A statement similar to Corollary 2.2 can also be applied to tensorized preconditioned Legendre polynomials, if sampling points are chosen from the tensorized Chebyshev measure. For further details, we refer the reader to [38].

2.3 Numerical illustrations

Although the interplay between sparsity and smoothness in function approximation becomes more pronounced in high-dimensional problems, we still observe benefits of weighted minimization in univariate polynomial interpolation problems. In this section we provide an illustration of this effect.

Polynomial interpolation usually refers to fitting the unique trigonometric or algebraic polynomial of degree through a given set of data of size . When is large, this problem becomes ill-conditioned, as illustrated for example by Runge’s phenomenon, or the tendency of high-degree polynomial interpolants to oscillate at the edges of an interval (the analogous phenomenon for trigonometric polynomial interpolation is Gibb’s phenomenon [19]). One method for minimizing the effect of Runge’s phenomenon is to carefully choose interpolation nodes – e.g., Chebyshev nodes for algebraic polynomial interpolation or equispaced nodes for trigonometric interpolation. Other methods known to reduce the effects of Runge’s phenomenon are the method of least squares, where one foregoes exact interpolation for a least squares projection of the data onto a polynomial of lower degree and which has been shown to be stable with respect to random sampling [14], or by doing weighted regularization [29], e.g. use for interpolation the function , where the coefficient vector solves the minimization problem


with being the sampling matrix as in (1.4).

In this section we provide evidence that weighted minimization can be significantly less sensitive to perturbations in the choice of the sampling points than these other methods, specifically unweighted minimization, least squares projections, weighted regularization, and exact polynomial interpolation.

For our numerical experiments, we follow the examples in [14] and consider on the smooth function

which was originally considered by Runge [41] to illustrate the instability of polynomial interpolation at equispaced points. We repeat the following experiment 100 times: draw sampling points i.i.d. from a measure on and compute the noise-free observations . We will use the uniform measure for real trigonometric polynomial interpolation and the Chebyshev measure for Legendre polynomial interpolation. We then compare the least squares approximation, unweighted approximation, weighted approximation with weights in (20), and weighted approximations with weights and weights We also compare to exact inversion, where we fit the sampling points with a polynomial whose maximal degree -1 (degrees of freedom) matches the number of sampling points. In Figures 2-3 we display the interpolations resulting from all 100 experiments overlaid so as to illustrate the variance of each interpolation method with respect to the choice of sampling points. In all experiments, we fix in the and minimization a maximal polynomial degree . For the least squares solution to be stable [14], we project onto the span of the first basis elements.

In this example, we observe that the weighted solutions are more consistently accurate than the other methods, including exact polynomial interpolation. There is mild sparsity present in this example; Runge’s function is an even function and so all odd coefficients in its Legendre / trigonometric expansion are zero. Indeed, the weighted solutions tend to pick up this sparsity and have zero odd coefficients, unlike the other interpolation methods which do not pick up on this sparsity.

We emphasize once more that this is just an illustration. For univariate polynomial interpolation in the setting where one can design carefully-chosen sampling points, e.g., Chebyshev nodes, weighted minimization should not outperform exact polynomial interpolation.

(a) Original function
(b) Least squares
(c) Weighted ,
(d) Exact inversion
(e) Unweighted
(f) Weighted ,
(g) Weighted ,
Figure 2: Overlaid interpolations of the function , , by real trigonometric polynomials using various reconstruction methods as described in Section 2.3. Different interpolations correspond to different random draws of sampling points from the uniform measure on .
(a) Least squares
(b) Weighted ,
(c) Exact inversion
(d) Unweighted
(e) Weighted ,
(f) Weighted ,
Figure 3: Overlaid interpolations of the function by Legendre polynomials using various reconstruction methods as described in Section 2.3. Different interpolations correspond to different random draws of sampling points according the Chebyshev measure on .

3 Weighted sparsity and quasi-best -term approximations

In this section we revisit some important technical results pertaining to weighted spaces that were touched upon in the introduction. First, unlike unweighted -term approximations for finite vectors, the weighted -term approximations are not straightforward to compute in general. Nevertheless, we can approximate using a quantity that can easily be computed from by sorting and thresholding, which we will call the quasi-best -term approximation.

Let denote the non-increasing rearrangement of the sequence , that is, for some permutation such that . Let be the maximal number such that and set so that . Then we call a weighted quasi-best -term approximation to and define the corresponding error of weighted quasi-best -term approximation as

By definition, . We also have a converse inequality relating the two -term approximations in the case of bounded weights.

Lemma 3.1.

Suppose that . Then


Let be the weighted best -term approximation to , and let be the weighted quasi-best -term approximation to . Because the supports of and , and also and , do not overlap, it suffices to show that

Assume without loss of generality that the terms in are ordered so that for all ; let be the largest integer such that , and . Because we know also that . Let be the largest integer less than or equal to , and let . Then implies that


Now, let be the vector

We constructed so that the first