Adaptive Near-Optimal Rank Tensor Approximation for High-Dimensional Operator Equations
We consider a framework for the construction of iterative schemes for operator equations that combine low-rank approximation in tensor formats and adaptive approximation in a basis. Under fairly general assumptions, we obtain a rigorous convergence analysis, where all parameters required for the execution of the methods depend only on the underlying infinite-dimensional problem, but not on a concrete discretization. Under certain assumptions on the rates for the involved low-rank approximations and basis expansions, we can also give bounds on the computational complexity of the iteration as a function of the prescribed target error. Our theoretical findings are illustrated and supported by computational experiments. These demonstrate that problems in very high dimensions can be treated with controlled solution accuracy.
Keywords: Low-rank tensor approximation, adaptive methods, high-dimensional operator equations, computational complexity
Mathematics Subject Classification (2000): 41A46, 41A63, 65D99, 65J10, 65N12, 65N15
Any attempt to recover or approximate a function of a large number of variables with the aid of classical low-dimensional techniques is inevitably impeded by the curse of dimensionality. This means that, when only assuming classical smoothness (e.g. in terms of Sobolev or Besov regularity) of order , the necessary computational work needed to realize a desired target accuracy in dimensions scales like , i.e., one faces an exponential increase in the spatial dimension . This can be ameliorated by dimension-dependent smoothness measures. In many high-dimensional problems of interest, the approximand has bounded high-order mixed derivatives, which under suitable assumptions can be used to construct sparse grid-type approximations where the computational work scales like . Under such regularity assumptions, one can thus obtain a convergence rate independent of . In general, however, the constant will still grow exponentially in . This has been shown to hold even under extremely restrictive smoothness assumptions in , and has been observed numerically in a relatively simple but realistic example in .
Hence, in contrast to the low-dimensional regime, regularity is no longer a sufficient structural property that ensures computational feasibility, and further low-dimensional structure of the sought high-dimensional object is required. Such a structure could be the dependence of the function on a much smaller (unknown) number of variables, see e.g. . It could also mean sparsity with respect to some (a priori) unknown dictionary. In particular, dictionaries comprized of rank-one tensors open very promising perspectives and have recently attracted substantial attention.
As a simple example consider on the unit cube , where the are sufficiently smooth. Employing for each factor a standard spline approximation of order with knots yields an -accuracy of order , which gives rise to an overall accuracy of the order of at the expense of degrees of freedom. Hence, assuming that does not depend on , an accuracy requires
degrees of freedom. In contrast, it would take the order of degrees of freedom to realize an accuracy of order when using a standard tensor product spline approximation, which means that in this case . Thus, while the first approximation – using a nonlinear parametrization of a reference basis – breaks the curse of dimensionality, the second one obviously does not.
Of course, being a simple tensor is in general an unrealistic assumption, but the curse of dimensionality can still be significantly mitigated when is well approximable by relatively short sums of rank-one tensors. By this we mean that for some norm we have
where the rank grows only moderately as decreases. In our initial example, in these terms we had for all . Assuming that all the factors in the above approximation are sufficiently smooth, the count (1) applied to each summand with target accuracy shows that now at most
degrees of freedom are required, which is still acceptable. This is clearly a very crude reasoning because it does not take a possible additional decay in the rank-one summands into account.
This argument, however, already indicates that good approximability in the sense of (2) is not governed by classical regularity assumptions. Instead, the key is to exploit an approximate global low-rank structure of . This leads to a highly nonlinear approximation problem, where one aims to identify suitable lower-dimensional tensor factors, which can be interpreted as a -dependent dictionary.
This discussion, although admittedly somewhat oversimplified, immediately raises several questions which we will briefly discuss as they guide subsequent developments.
Format of approximation: The hope that in (2) can be rather small is based on the fact that the rank-one tensors are allowed to “optimally adapt” to the approximand . The format of the approximation used in (2) is sometimes called canonical since it is a formal direct generalization of classical Hilbert Schmidt expansions for . However, a closer look reveals a number of well-known pitfalls. In fact, they are already encountered in the discrete case. The collection of sums of ranks one tensors of a given length is not closed, and the best approximation problem is not well-posed, see e.g. . There appears to be no reliable computational strategy that can be proven to yield near-minimal rank approximations for a given target accuracy in this format. In this work, we therefore employ different tensor formats that allow us to obtain provably near-minimal rank approximations, as explained later.
A two-layered problem: Given a suitable tensor format, even if a best tensor approximation is known in the infinite-dimensional setting of the continuous problem, the resulting lower-dimensional factors still need to be approximated. Since finding these factors is part of the solution process, the determination of efficient discretizations for these factors will need to be intertwined with the process of finding low-rank expansions. We have chosen here to organize this process through selecting low-dimensional orthonormal wavelet bases for the tensor factors. However, other types of basis expansions would be conceivable as well.
1.2 Conceptual Preview
The problem of finding a suitable format of tensor approximations has been extensively studied in the literature over that past years, however, mainly in the discrete or finite-dimensional setting, see e.g. [26, 23, 32, 18, 34]. Some further aspects in a function space setting have been addressed e.g. in [39, 15, 40]. For an overview and further references we also refer to  and the recent survey . A central question in these works is: given a tensor, how can one in a stable manner obtain low-rank approximations, and how accurate are they when compared with best tensor approximations in the respective format?
We shall heavily draw on these findings in the present paper, but under the following somewhat different perspectives. First of all, we are interested in the continuous infinite-dimensional setting, i.e., in sparse tensor approximations of a function which is a priori not given in any finite tensor format but which one may expect to be well approximable by simple tensors in a way to be made precise later. We shall not discuss here the question under which concrete conditions this is actually the case. Moreover, the objects to be recovered are not given explicitly but only implicitly as a solution to an operator equation
where is an isomorphism of some Hilbert space onto its dual . One may think of , in the simplest instance, as a high-dimensional space, or as a Sobolev space. More generally, as in the context of parametric diffusion problems, could be a tensor product of a Sobolev space and an space. Accordingly, we shall always assume that we have a Gelfand triplet
in the sense of dense continuous embeddings, where we assume that is a tensor product Hilbert space, that is,
with lower-dimensional Hilbert spaces . A typical example would be for a domain of small spatial dimension.
The main contribution of this work is to put forward a strategy that addresses the main obstacles identified above and results in an algorithm which, under mild assumptions, can be rigorously proven to provide for any target accuracy an approximate solution of near-minimal rank and representation complexity of the involved tensor factors. Specifically, (i) it is based on stable tensor formats relying on optimal subspaces; (ii) successive solution updates involve a combined refinement of ranks and factor discretizations; (iii) (near-)optimality is achieved, thanks to (i), through accompanying suitable subspace correction and coarsening schemes.
The following comments on the main ingredients are to provide some orientation. A first essential step is to choose a universal basis for functions of a single variable in . Here, we focus on wavelet bases, but other systems like the trigonometric system for periodic problems are conceivable as well. As soon as functions of a single variable, especially the factors in our rank-one tensors, are expanded in such a basis, the whole problem of approximating reduces to approximating its infinite coefficient tensor induced by the expansion
see below. The original operator equation (4) is then equivalent to an infinite system
For standard types of Sobolev spaces it is well understood how to rescale the tensor product basis in such a way that it becomes a Riesz basis for . This, in turn, together with the fact that is finite, allows one to show that is finite, see . Hence one can find a positive such that , i.e., the operator is a contraction so that the iteration
converges for any initial guess to the solution of (7).
Of course, (8) is only an idealization because the full coefficient sequences cannot be computed. Nevertheless, adaptive wavelet methods can be viewed as realizing (8) approximately, keeping possibly few wavelet coefficients “active” while still preserving enough accuracy to ensure convergence to (see e.g. [9, 10]).
In the present high-dimensional context this kind of adaptation is no longer feasible. Instead, we propose here a “much more nonlinear” adaptation concept. Being able to keep increasingly accurate approximations on a path towards near-minimal rank approximations with properly sparsified tensor factors relies crucially on suitable correction mechanisms. An important contribution of this work is to identify and analyze just such methods. Conceptually, they are embedded in a properly perturbed numerical realization of (8) of the form
where , are certain reduction operators and the , , are suitable tolerances which decrease for increasing .
More precisely, the purpose of is to “correct” the current tensor expansion and, in doing so, reduce the rank subject to an accuracy tolerance . We shall always refer to such a rank reduction operation as a recompression. For this operation to work as desired, it is essential that the employed tensor format is stable in the sense that the best approximation problem for any given ranks is well-posed. As explained above, this excludes the use of the canonical format. Instead we use the so-called hierarchical Tucker (HT) format, since on the one hand it inherits the stability of the Tucker format , as a classical best subspace method, while on the other hand it better ameliorates the curse of dimensionality that the Tucker format may still be prone to. In §2 we collect the relevant prerequisites. This draws to a large extent on known results for the finite-dimensional case, but requires proper formulation and extension of these notions and facts for the current sequence space setting. The second reduction operation , in turn, is a coarsening scheme that reduces the number of degrees of freedom used by the wavelet representations of the tensor factors, again subject to some accuracy constraint .
1.3 What is New?
The use of rank reduction techniques in iterative schemes is in principle not new, see e.g. [5, 6, 22, 25, 27, 29, 3] and the further references given in . To our knowledge, corresponding approaches can be subdivided roughly into two categories. In the first one, iterates are always truncated to a fixed tensor rank. This allows one to control the complexity of the approximation, but convergence of such iterations can be guaranteed only under very restrictive assumptions (e.g. concerning highly effective preconditioners). In the second category, schemes achieve a desired target accuracy by instead prescribing an error tolerance for the rank truncations, but the corresponding ranks arising during the iteration are not controlled. A common feature of both groups of results is that they operate on a fixed discretization of the underlying continuous problems.
In contrast, the principal novelty of the present approach can be sketched as follows. The first key element is to show that based on a known error bound for a given approximation to the unknown solution, a judiciously chosen recompression produces a near-minimal rank approximation to the solution of the continous problem for a slightly larger accuracy tolerance. Moreover, the underlying projections are stable with respect to certain sparsity measures. As pointed out before, this reduction needs to be intertwined with a sufficiently accurate but possibly coarse approximation of the tensor factors. A direct coarsening of the full wavelet coefficient tensor would face the curse of dimensionality, and thus would be practically infeasible. The second critical element is therefore to introduce certain lower-dimensional quantities, termed tensor contractions, from which the degrees of freedom to be discarded in the coarsening are identified. This notion of contractions also serves to define suitable sparsity classes with respect to wavelet coefficients, facilitating a computationally efficient, rigorously founded combination of tensor recompression and coefficient coarsening.
These concepts culminate in the main result of this paper, which can be summarized in an admittedly oversimplified way as follows.
Meta-Theorem: Whenever the solution to (7) has certain tensor-rank approximation rates and when the involved tensor factors have certain best -term approximation rates, then a judicious numerical realization of the iteration (9) realizes these rates. Moreover, up to logarithmic factors, the computational complexity is optimal. More specifically, for the smallest such that the approximate solution satisfies , has HT-ranks that can be bounded, up to multiplication by a uniform constant, by the smallest possible HT-ranks needed to realize accuracy .
In the theorem that we will eventually prove we admit classes of operators with unbounded ranks, in which case the rank bounds contain a factor of the form , where is a fixed exponent.
To our knowledge this is the first result of this type, where convergence to the solution of the infinite-dimensional problem is guaranteed under realistic assumptions, and all ranks arising during the process remain proportional to the respective smallest possible ones. A rigorous proof of rank near optimality, using an iteration of the above type, is to be contrasted to approaches based on greedy approximation as studied e.g. in , where approximations in the (unstable) canonical format are constructed through successive greedy updates. This does, in principle, not seem to offer much hope for finding minimal or near-minimal rank approximations, as the greedy search operates far from orthonormal bases, and errors committed early in the iteration cannot easily be corrected. Although variants of the related proper generalized decomposition, as studied in , can alleviate some of these difficulties, e.g. by employing different tensor formats, the basic issue of controlling ranks in a greedy procedure remains.
The remainder of the paper is devoted to the development of the ingredients and their complexity analysis needed to make the statements in the above Meta-Theorem precise. Trying to carry out this program raises some issues which we will briefly address now, as they guide the subsequent developments.
After collecting in §2 some preliminaries, §3 is devoted to a pivotal element of our approach, namely the development and analysis of suitable recompression and coarsening schemes that yield an approximation in the HT-format that is, for a given target accuracy, of near-minimal rank with possibly sparse tensor factors (in a sense to be made precise later).
Of course, one can hope that the solution of (4) is particularly tensor sparse in the sense that relatively low HT-ranks already provide high accuracy if the data are tensor sparse, and if the operator (resp. ) is tensor sparse in the sense that its application does not increase ranks too drastically. Suitable models of operator classes that allow us to properly weigh tensor sparsity and wavelet expansion sparsity are introduced and analyzed in §4. The approximate application of such operators with certified output accuracy builds on the findings in §3.
Finally, in §5 we formulate an adaptive iterative algorithm and analyze its complexity. Starting from the coarsest possible approximation , approximations in the tensor format are built successively, where the error tolerances in the iterative scheme are updated for each step in such a way that two goals are achieved. On the one hand, the tolerances are sufficiently stringent to guarantee the convergence of the iteration up to any desired target accuracy. On the other hand, we ensure that at each stage of the iteration, the approximations remain sufficiently coarse to realize the Meta-Theorem formulated above. Here we specify concrete tensor approximability assumptions on , and that allow us to make its statement precise.
In this section we set the notation and collect the relevant ingredients for stable tensor formats in the infinite-dimensional setting. In the remainder of this work, we shall use for simplicity the abbreviation , with the -space on the appropriate index set.
Our basic assumption is that we have a Riesz basis for , where is a countable index set. In other words, we require that the index set has Cartesian product structure. Therefore any can be identified with its basis coefficient sequence in the unique representation , with uniformly equivalent norms. Thus, will in general correspond to the spatial dimension of the domain of functions under consideration. In addition it can be important to reserve the option of grouping some of the variables in a possibly smaller number of portions of variables, i.e., and for .
A canonical point of departure for the construction of is a collection of Riesz bases for each component Hilbert space (see (6)), which we denote by . To fit in the above context, we may assume without loss of generality that all are identical, denoted by . The precise structure of is irrelevant at this point; however, in the case that the are wavelets, each encodes a dyadic level and a spatial index . This latter case is of particular interest, since for instance when is a Sobolev space, a simple rescaling of yields a Riesz basis for as well.
A simple scenario would be , which is the situation considered in our numerical illustration in §6. A second example are elliptic diffusion equations with stochastic coefficients. In this case, , and . Here a typical choice of bases for are tensor products of polynomials on , while one can take a wavelet basis for , obtained by rescaling a standard basis. A third representative scenario concerns diffusion equations on high-dimensional product domains . Here, for instance, and . We shall comment on some additional difficulties that arise in the application of operators in this case in Remark 18.
We now regard as a tensor of order on and look for representations or approximations of in terms of rank-one tensors
Rather than looking for approximations or representations in the canonical format
we will employ tensor representations of a format that is perhaps best motivated as follows. Consider for each (finitely or infinitely many) pairwise orthonormal sequences , , that is,
We stress that here and in the sequel is admitted. The matrices are often termed orthonormal mode frames. It will be convenient to use the notational convention , , , and so forth, for multiindices in , . Defining for
and noting that is a tensor product Hilbert space, the tensors
form an orthonormal basis for the subspace of , generated by the system . Hence, for any the orthogonal projection
is the best approximation to from the subspace spanned by . The uniquely defined order- tensor with entries , , is referred to as core tensor. Moreover, when the , are bases for all of , that is, , one has, of course, , while for any , componentwise, the “box-truncation”
is a simple mechanism of further reducing the ranks of an approximation from the subspace spanned by at the expense of a minimal loss of accuracy.
The existence of best approximations and their realizability through linear projections suggests approximating a given tensor in by expressions of the form
even without insisting on the th mode frame to have pairwise orthonormal column vectors , . However, these columns can always be orthonormalized, which results in a corresponding modification of the core tensor ; for fixed mode frames, the latter is uniquely determined. When writing sometimes for convenience , although the may be specified through (13) only for , it will always be understood to mean , for .
If the core tensor is represented directly by its entries, (13) corresponds to the so-called Tucker format [37, 38] or subspace representation. The hierarchical Tucker format , as well as the special case of the tensor train format , correspond to representations in the form (13) as well, but use a further structured tensor decomposition for the core tensor that can exploit a stronger type of information sparsity. For the singular value decomposition (SVD) or its infinite dimensional counterpart, the Hilbert-Schmidt decomposition, yield -dependent mode frames that even give a diagonal core tensor. Although this is no longer possible for , the SVD remains the main work horse behind Tucker as well as hierarchical Tucker formats. For the convenience of the reader, we summarize below the relevant facts for these tensor representations in a way tailored to the present needs.
2.1 Tucker format
It is instructive to consider first the simpler case of the Tucker format in more detail.
2.1.1 Some Prerequisites
As mentioned before, for a general , the sum in (13) may be infinite. For each we consider the mode- matricization of , that is, the infinite matrix with entries for , which defines a Hilbert-Schmidt operator
We define the rank vector by its entries
see . It is referred to as the multilinear rank of . We denote by
the set of admissible rank vectors in the Tucker format. For such rank vectors , we introduce the notation
Given any , we can then define the set
of those sequences whose multilinear rank is bounded componentwise by . It is easy to see that the elements of possess a representation of the form (13). Specifically, for any system of orthonormal mode frames with columns (where could be infinity), the -rigid Tucker class
is contained in .
The actual computational complexity of the elements of can be quantified by
It is not hard to see that these quantities are controlled by the “joint support” of the th mode frame, that is, . Note that if , one necessarily also has .
Let and , then there exists such that
The matricization of a given , defined in (14), allows one to invoke the SVD or Hilbert-Schmidt decomposition. By the spectral theorem, for each there exist a nonnegative real sequence , where are the eigenvalues of , as well as orthonormal bases for a subspace of and for (again tacitly assuming that for ), such that
The are referred to as mode- singular values.
To simplify notation in a summary of the properties of the particular orthonormal mode frames , , defined by (20), we define for any vector and for ,
to refer to the corresponding vector with entry deleted or entry replaced by , respectively. We shall also need the auxiliary quantities
derived from the core tensor, where and .
2.1.2 Higher-Order Singular Value Decomposition
The representation (20) is the main building block of the higher-order singular value decomposition (HOSVD) , for the Tucker tensor format (13). In the following theorem, we summarize its properties in the more general case of infinite-dimensional sequence spaces, where the singular value decomposition is replaced by the spectral theorem for compact operators. These facts could also be extracted from the treatment in [21, Section 8.3].
For any the orthonormal mode frames , , with , defined by (20), and the corresponding core tensor with entries , have the following properties:
For all we have , and for all , where are the mode- singular values in (20).
For all and all , we have where the are defined by (22).
For each , we have
If in addition for finite , then and we have with satisfying for .
The representation (20) converges in the Hilbert-Schmidt norm and, as a consequence, we have
with convergence in . Furthermore, with is an orthonormal system in (spanning a strict subspace of when ). For we have thus shown and . The further properties of the expansion can now be obtained along the lines of , see also [21, 2]. ∎
In what follows we shall denote by
the particular system of orthonormal mode frames generated for a given by HOSVD. It will occasionally be important to identify the specific tensor format to which a given system of mode frames refers, for which we use a corresponding superscript, such as in for the Tucker format.
Property (iii) in Theorem 2 leads to a simple procedure for truncation to lower multilinear ranks with an explicit error estimate in terms of the mode- singular values. In this manner, one does not necessarily obtain the best approximation for prescribed rank, but the approximation is quasi-optimal in the sense that the error is at most by a factor larger than the error of best approximation with the same multilinear rank.
We now introduce the notation
This quantity plays the role of a computable error estimate, as made explicit in the following direct consequence of Theorem 2.
While projections to subspaces spanned by the , , do in general not realize the best approximation from (only from ), exact best approximations are still orthogonal projections based on suitable mode frames.
For and with , , there exists an orthonormal mode frame system such that
with given by (11).
By Theorem 1, a best approximation of ranks for ,
exists. Defining as the orthonormal mode frame system for , given by the HOSVD, we obtain the assertion. ∎
Suppose that for a finitely supported vector on , we have a possibly redundant representation
where the vectors , may be linearly dependent. Then by standard linear algebra procedures, we can obtain a HOSVD of with a number of arithmetic operations that can be estimated by
where is an absolute constant (see, e.g., ).
2.2 The Hierarchical Tucker Format
The Tucker format as it stands, in general, still gives rise to an increase of degrees of freedom that is exponential in . One way to mitigate the curse of dimensionality is to further decompose the core tensor in (13). We now briefly formulate the relevant notions concerning the hierarchical Tucker format in the present sequence space context, following essentially the developments in [23, 18], see also .
2.2.1 Dimension Trees
Let , . A set is called a (binary) dimension tree if the following hold:
and for each , we have .
Each is either a singleton or there exist unique disjoint , called children of , such that .
Singletons are referred to as leaves,
as root, and elements of as interior nodes. The set of leaves is denoted by , where we additionally set . When an enumeration of is required, we shall always assume the ascending order with respect to the indices, i.e., in the form .
It will be convenient to introduce the two functions
producing the “left” and “right” children of a non-leaf node which, in view of Definition 1, are well-defined up to their order, which we fix by the condition .
Note that for a binary dimension tree as defined above, and .
The restriction to binary trees in Definition 1 is not necessary, but leads to the most favorable complexity estimates for algorithms operating on the resulting tensor format. With this restriction dropped, the Tucker format (13) can be treated in the same framework, with the -ary dimension tree consisting only of root and leaves, i.e., . In principle, all subsequent results carry over to more general dimension trees (see [16, Section 5.2]).
We shall refer to a family
with for each , as hierarchical mode frames. In addition, these are called orthonormal if for all , we have for , and nested if
As for the Tucker format, we set , and for we retain the notation
Again to express that is associated with the hierarchical format we sometimes write . Of course, depends on the dimension tree , which will be kept fixed in what follows.
To define hierarchical tensor classes and to construct specific -dependent hierarchical mode frames one can proceed as for the Tucker format. Let be a dimension tree, let be an interior node, and . For , we define the Hilbert-Schmidt operator
To be consistent with our previous notation for leaf nodes , we use the abbreviation . Again, can be infinite. The root element of the dimension tree, , is a special case. Here we define
and correspondingly set
if , and otherwise . To be consistent with the Tucker format we denote by
the hierarchical rank vector associated with . Since in what follows the dimension tree will be kept fixed we suppress the corresponding subscript in the rank vector.
This allows us to define for a given , in analogy to (17), the class
For to be non-empty the rank vectors must satisfy certain compatibility conditions, see Proposition 1 below. As detailed later, the elements of can be represented in terms of hierarchical mode frames in the so called hierarchical format with ranks .
Now, for a given , let , be the left singular vectors and be the singular values of . In analogy to the Tucker format we denote by
the system of orthonormal hierarchical mode frames with rank vectors .
The observation that the specific systems of hierarchical mode frames have the following nestedness property, including the root element, will be crucial. The following fact has been established in a more generally applicable framework of minimal subspaces in  (cf. Corollary 6.18 and Theorem 6.31 there).
For and , the mode frames given by the left singular vectors of the operators defined in (28) satisfy
i.e., the family of left singular vectors of the operators is comprized of orthonormal and nested mode frames for .
Nestedness entails compatibility conditions on the rank vectors . In fact, it readily follows from Proposition 1 that for one has . For necessary and sufficient conditions on a rank vector for existence of corresponding nested hierarchical mode frames, we refer to [21, Section 11.2.3]. In what follows we denote by
the set of all hierarchical rank vectors satisfying the compatibility conditions for nestedness.
Let , let be a dimension tree, and let with for , then there exists