Adaptive Approximation for Multivariate Linear Problems with Inputs Lying in a Cone
We study adaptive approximation algorithms for general multivariate linear problems where the sets of input functions are non-convex cones. While it is known that adaptive algorithms perform essentially no better than non-adaptive algorithms for convex input sets, the situation may be different for non-convex sets. A typical example considered here is function approximation based on series expansions. Given an error tolerance, we use series coefficients of the input to construct an approximate solution such that the error does not exceed this tolerance. We study the situation where we can bound the norm of the input based on a pilot sample, and the situation where we keep track of the decay rate of the series coefficients of the input. Moreover, we consider situations where it makes sense to infer coordinate and smoothness importance. Besides performing an error analysis, we also study the information cost of our algorithms and the computational complexity of our problems, and we identify conditions under which we can avoid a curse of dimensionality.
In many situations, adaptive algorithms can be rigorously shown to perform essentially no better than non-adaptive algorithms. Yet, in practice adaptive algorithms are appreciated because they relieve the user from stipulating the computational effort required to achieve the desired accuracy. The key to resolving this seeming contradiction is to construct a theory based on assumptions that favor adaptive algorithms. We do that here.
Adaptive algorithms infer the necessary computational effort based on the function data sampled. Adaptive algorithms may perform better than non-adaptive algorithms if the set of input functions is non-convex. We construct adaptive algorithms for general multivariate linear problems where the input functions lie in non-convex cones. Our algorithms use a finite number of series coefficients of the input function to construct an approximate solution that satisfies an absolute error tolerance. We show our algorithms to be essentially optimal. We derive conditions under which the problem is tractable, i.e., the information cost of constructing the approximate solution does not increase exponentially with the dimension of the input function domain. In the remainder of this section we define the problem and essential notation. But first, we present a helpful example.
1.1 An Illustrative Example
Consider the case of approximating functions defined over , using a Chebyshev polynomial basis. The input function is denoted , and the solution is . In this case,
Approximating well by a finite sum requires knowing which terms in the infinite series for are more important. Let denote a Hilbert space of input functions where the norm of is a -weighted norm of the series coefficients:
The are non-negative coordinate weights, which embody the assumption that may depend more strongly on coordinates with larger than those with smaller . The definition of the -norm implies that an input function must have series coefficients that decay quickly enough as the degree of the polynomial increases. Larger implies smoother input functions.
The ordering of the weights,
implies an ordering of the wavenumbers, . It is natural to approximate the solution using the first series coefficients as follows:
Here, we assume that it is possible to sample the series coefficients of the input function. This is a less restrictive assumption than being able to sample any linear functional, but it is more restrictive than only being able to sample function values. An important future problem is to extend the theory in this chapter to the case where the only function data available are function values.
The error of this approximation in terms of the norm on the output space, , can be expressed as
If one has a fixed data budget, , then is the best answer.
However, our goal is an algorithm, that satisfies the error criterion
where is the error tolerance, and is the set of input functions for which is successful. This algorithm contains a rule for choosing —depending on and —so that . The objectives of this chapter are to
construct such a rule,
choose a set of input functions for which the rule is valid,
characterize the information cost of ,
determine whether has optimal information cost, and
understand the dependence of this cost on the number of input variables, , as well as the error tolerance, .
1.2 General Linear Problem
Now, we define our problem more generally. A solution operator maps the input function to an output, . As in the illustrative example above, the Banach spaces of inputs and outputs are defined by series expansions:
Here, is a basis for the input Banach space , is a basis for the output Banach space , is a countable index set, and is the sequence of weights. These bases are defined to match the solution operator:
The represent the importance of the series coefficients of the input function. The larger is, the more important is.
Although this problem formulation is quite general in some aspects, condition (3) is somewhat restrictive. In principle, the choice of basis can be made via the singular value decomposition, but in practice, if the norms of and are specified without reference to their respective bases, it may be difficult to identify bases satisfying (3).
To facilitate our derivations below, we establish the following lemma via Hölder’s inequality:
Taking in the lemma above, the norm of the solution operator can be expressed in terms of the norm of as follows:
We assume throughout this chapter that the weights are chosen to keep this norm is finite, namely,
1.3 An Approximation and an Algorithm
The optimal approximation based on series coefficients of the input function is defined in terms of the series coefficients of the input function corresponding to the largest as follows:
By the argument leading to (6) it follows that
An upper bound on the approximation error follows from Lemma 1:
This leads to the following theorem.
Let denote the ball of radius in the space of input functions. The error of the approximation defined in (9) is bounded tightly above as
Moreover, the worst case error over of , for any approximation based on series coefficients of the input function, can be no smaller.
Let depend on the series coefficients indexed by . Use Lemma 1 with to choose to mimic the zero function, have norm , and have as large a solution as possible, i.e.,
Then because mimics the zero function, and
The ordering of the implies that for arbitrary can be no smaller than the case . This completes the proof.
After defining the information cost of an algorithm and the problem complexity in the next subsection, we demonstrate that this non-adaptive algorithm is optimal when the set of inputs is chosen to be . However, typically one cannot bound the norm of the input function a priori, so Algorithm 1 is impractical.
The key difficulty is that error bound (12) depends on the norm of the input function. In contrast, we will construct error bounds for that only depend on function data. These will lead to adaptive algorithms satisfying error criterion (2). For such algorithms, the set of allowable input functions, , will be a cone, not a ball.
Note that algorithms satisfying error criterion (2) cannot exist for . Any algorithm must require a finite sample size, even if it is huge. Then, there must exist some that looks exactly like the zero function to the algorithm but for which is arbitrarily large. Thus, algorithms satisfying the error criterion exist only for some strict subset of . Choosing that subset well is both an art and a science.
1.4 Information Cost and Problem Complexity
The information cost of is denoted and defined as the number of function data—in our situation, series coefficients—required by . For adaptive algorithms this cost varies with the input function . We also define the information cost of the algorithm in general, recognizing that it will tend to depend on :
Note that while the cost depends on , has no knowledge of beyond the fact that it lies in . It is common for to be , or perhaps asymptotically .
Let denote the set of all possible algorithms that may be constructed using series coefficients and that satisfy error criterion (2). We define the computational complexity of a problem as the information cost of the best algorithm:
If the complexity of the problem is , the cost of an essentially optimal algorithm is also . If the complexity of the problem is asymptotically , then the cost of an essentially optimal algorithm is also asymptotically . We will show that our adaptive algorithms presented in Sections 2 and 3 are essentially optimal.
The non-adaptive Algorithm 1 has an information cost for the set of input functions that is given by
This algorithm is essentially optimal for the set of input functions , namely,
where and are arbitrary and fixed, and .
Fix positive , , , and as defined above. For and , the information cost of non-adaptive Algorithm 1 follows from its definition. Let
Construct an input function as in the proof of Theorem 1 with . By the argument in the proof of Theorem 1, any algorithm in that can approximate with an error no greater than must use at least series coefficients. Thus,
Thus, Algorithm 1 is essentially optimal.
For Algorithm 1, the information cost, , depends on the decay rate of the tail norm of the . This decay may be algebraic or exponential and also determines the problem complexity, , as a function of the error tolerance, .
This theorem illustrates how an essentially optimal algorithm for solving a problem for a ball of input functions, , can be non-adaptive. However, as alluded to above, we claim that it is impractical to know a priori which ball your input function lies in. On the other hand, in the situations described below where is a cone, we will show that actually contains only adaptive algorithms via the lemma below. The proof of this lemma follows directly from the definition of non-adaptivity.
For a given set of input functions, , if contains any non-adaptive algorithms, then for every ,
Besides understanding the dependence of on , we also want to understand how depends on the dimension of the domain of the input function. Suppose that , for some , and let denote the dependence of the input space on the dimension . The set of functions for which our algorithms succeed, , depends on the dimension, too. Also, , , , and depend implicitly on dimension, and this dependence is sometimes indicated explicitly by the subscript .
Different dependencies of on the dimension and the error tolerance are formalized as different notions of tractability. Since the complexity is defined in terms of the best available algorithm, tractability is a property that is inherent to the problem, not to a particular algorithm. We define the following notions of tractability (for further information on tractability we refer to the trilogy , , ). Note that in contrast to these references we explicitly include the dependence on in our definitions. This dependence is natural for cones and might be different if is not a cone.
We say that the adaptive approximation problem is strongly polynomially tractable if and only if there are non-negative , , , and such that
The infimum of satisfying the bound above is denoted by and is called the exponent of strong polynomial tractability.
We say that the problem is polynomially tractable if and only if there are non-negative , , , and such that
We say that the problem is weakly tractable iff
We remark that, for the sake of brevity, we focus here on tractability notions that are summarized as algebraic tractability in the recent literature (see, e.g., ). Theoretically, one could also study exponential tractability, where one would essentially replace by in the previous tractability notions. A more detailed study of tractability will be done in a future paper.
1.6 The Illustrative Example Revisited
Using the non-increasing ordering of the , we employ a standard technique for bounding the largest in terms of the sum of the power of all the . For ,
Hence, substituting the above upper bound on into the formula for the complexity of the problem, we obtain an upper bound on the complexity:
If is the infimum of the for which is finite, and is finite, then we obtain strong polynomial tractability and an exponent of strong tractability that is . On the other hand, if the coordinate weights are all unity, , then there are different with a value of , and so , and the problem is not tractable.
1.7 What Comes Next
In the following section we define a cone of input functions, , in (16) whose norms can be bounded above in terms of the series coefficients obtained from a pilot sample. Adaptive Algorithm 2 is shown to be optimal for this . We also identify necessary and sufficient conditions for tractability.
2 Bounding the Norm of the Input Function Based on a Pilot Sample
2.1 The Cone and the Optimal Algorithm
The premise of an adaptive algorithm is that the finite information we observe about the input function tells us something about what is not observed. Let denote the number of pilot observations, based on the set of wavenumbers
where the are defined by the ordering of the in (1). Let be some constant inflation factor greater than one. The cone of functions whose norm can be bounded well in terms of a pilot sample, , is given by
Referring to error bound (11), we see that the error of depends on the series coefficients not sampled. The definition of allows us to bound these as follows:
This inequality together with error bound (11) implies the data-based error bound
The upper bound on the computational cost of this algorithm is obtained by noting that
since for all , . Moreover, this inequality is tight for some , namely, those certain for which for . This completes the proof of (18).
To prove the lower complexity bound, choose and such that
The input function is defined via its series coefficients as in Lemma 1, having nonzero coefficients only for :
Suppose that samples the series coefficients for , and let denote the cardinality of .
Now, construct the input function , having zero coefficients for and also as in Lemma 1:
Let . By the definitions above, it follows that
Therefore, . Moreover, since the series coefficients for are the same for , it follows that . Thus, must be quite similar to .
Using an argument like that in the proof of Theorem 1, it follows that
by the ordering of the in (1). By the choice of and above, it follows that . This inequality then implies lower complexity bound (19). Because it follows from Lemma 2 that contains only adaptive algorithms.
The above derivation assumes that . If , then our cone consists of functions whose series coefficients vanish for wavenumbers outside . The exact solution can be constructed using only the pilot sample. Our algorithm is then non-adaptive, but succeeds for input functions in the cone , which is an unbounded set.
We may not be able to guarantee that a particular of interest lies in our cone, , but we may derive necessary conditions for to lie in . The following proposition follows from the definition of in (16) and the fact that the term on the left below underestimates .
If , then
In this section, we write instead of , to stress the dependence on , and for the same reason we write instead of . Recall that we assume that . Let
where the positive constants and depend on , but not depend on , , or . From the equation above, it is clear that tractability depends on the behavior of as and tend to infinity. We would like to study under which conditions we obtain the various tractability notions defined in Section 1.5.
To this end, we distinguish two cases, depending on whether is infinite or not. This distinction is useful because it allows us to relate the computational complexity of the algorithms considered in this chapter to the computational complexity of linear problems on certain function spaces considered in the classical literature on information-based complexity, as for example . The case corresponds to the worst-case setting, where one studies the worst performance of an algorithm over the unit ball of a space. The results in Theorem 4 below are indeed very similar to the results for the worst-case setting over balls of suitable function spaces. The case corresponds to the so-called average-case setting, where one considers the average performance over a function space equipped with a suitable measure. For both of these settings there exist tractability results that we will make use of here.
Case 1: :
If , we have, due to the monotonicity of the ,
We then have the following theorem.
Using the same notation as above, the following statements hold for the case .
We have strong polynomial tractability if and only if there exist and such that
Furthermore, the exponent of strong polynomial tractability is then equal to the infimum of those for which (22) holds.
We have polynomial tractability if and only if there exist and such that
We have weak tractability if and only if
Letting , we see that